Rate-controlled optical burst switching

ABSTRACT

The invention provides a method and network communication equipment for low latency loss-free burst switching. Burst-transfer schedules are determined by controllers of bufferless core nodes according to specified bitrate allocations and distributed to respective edge nodes. In a composite-star network, burst schedules are initiated by any core node. Burst formation takes place at source edge nodes and a permissible burst size is determined according to an allocated bitrate of a burst stream to which the burst belongs. The permissible burst size is subject to constraints such as permissible burst-formation delay, a minimum guard-time requirement, and permissible delay jitter. A method of control-burst exchange between each edge node and each bufferless core node enables burst scheduling, time coordination, and loss-free burst switching. Both the payload bursts and control bursts are carried by optical channels connecting the edge nodes and the core notes.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a continuation of U.S. patent application Ser. No.13/792,825, filed Mar. 11, 2013, entitled RATE-CONTROLLED OPTICAL BURSTSWITCHING, which is a continuation of U.S. patent application Ser. No.11/625,949, filed Jan. 23, 2007, now U.S. Pat. No. 8,406,246, issuedMar. 26, 2013, entitled RATE-CONTROLLED OPTICAL BURST SWITCHING, whichis a continuation of U.S. patent application Ser. No. 10/054,512, filedNov. 13, 2001, now U.S. Pat. No. 7,187,654, issued Mar. 6, 2007,entitled RATE-CONTROLLED OPTICAL BURST SWITCHING, the entirety of whichare incorporated herein by reference.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

n/a

FIELD OF THE INVENTION

The present invention relates to data networks and, in particular, toburst switching in an optical-core network.

BACKGROUND OF THE INVENTION

A data network comprises a number of source nodes, each source nodereceiving traffic from numerous traffic sources, and a number of sinknodes, each sink node delivering data to numerous traffic sinks. Thesource nodes can be connected to the sink nodes directly or through corenodes. Source nodes and sink nodes are often paired to form edge nodes,where a source node and sink node of an edge node share memory andcontrol.

Each link between two nodes may comprise multiple channels. An opticalmulti-channel link uses Wavelength Division Multiplexing (WDM). WDMallows a given optical link to be divided into multiple channels, wherea distinct stream of data may be transmitted on each channel and adifferent wavelength of light is used as a carrier wave to form each ofthe multiple channels within the optical link.

The performance, efficiency, and scalability of a telecommunicationsnetwork depend heavily on the nodal degree and the directly relatednetwork diameter. The degree of a specific node is a measure of thenumber of nodes to which the specific node directly connects. The termtopological reach is used herein to refer to the number of sink nodesthat a source node can reach directly or through the network core. Thediameter of a network is a measure of the maximum number of hops alongthe shortest path between any two nodes. For a given network capacity,the higher the nodal degree, the smaller the network diameter becomes,and a small network diameter generally yields high performance and highefficiency. On the other hand, for a given nodal degree, scalabilitygenerally increases with the network diameter, but to the detriment ofnetwork efficiency. It is therefore advantageous to increase the nodaldegree to the highest limit that technology permits.

In a network based on channel switching, a source node connects todestination sink nodes through channels, each channel being associatedwith a wavelength. The topological reach of a source node, i.e., thenumber of destination sink nodes that the source node can reach withoutswitching at an intermediate edge node, is then limited by the number ofchannels emanating from the source node, which is typicallysignificantly smaller than the number of edge nodes in the network.Time-sharing enables fine switching granularity and, hence, a hightopological reach. Effective time-sharing in a bufferless-core networkrequires that the edge nodes be time-locked to the core nodes, that allnodes be fast-switching, and that a path between two edge nodestraverses a single optical core node. A node X is said to be time-lockedto a node Y if, at any instant of time, the reading of a time-counter atnode X equals the sum of a reading of an identical time-counter at nodeY and the propagation time from node X to node Y, where the timecounters at nodes X and Y have the same period, and the propagationdelay is measured relative to said period. Thus, if each of several edgenodes transmits a pulse, when its time-counter reading is ‘t′’, to aspecific core node, the pulses from the edge nodes arrive at the corenode when the time-counter reading of the core node is also TDM(time-division-multiplexing) and burst switching are two modes ofnetwork time sharing. In TDM, data is organized in a time-slotted frameof a predefined duration and a path from a source node to a sink nodemay be allocated one or more time slots. In burst switching, datapackets are aggregated into bursts, generally of different sizes, andthe bursts are switched in the core towards destination sink nodes,where each burst is disassembled into constituent packets. Both TDM andburst switching can be exploited to increase the nodal degree, hencereduce the network diameter. The application of TDM in an optical-corenetwork is described in Applicant's U.S. patent application Ser. No.09/960,959, filed on Sep. 25, 2001 and titled “Switched channel-bandNetwork,” which is incorporated herein by reference. Prior-art burstswitching has attractive features but has two main drawbacks:burst-transfer latency and burst loss. In a closed-loop scheme, a sourcenode sends a request to a core node for transferring a burst, therequest including a destination and size of the burst, and waits for amessage from the core node, where the message acknowledges that theoptical switch in the core node is properly configured, before sendingthe burst. In an open-loop scheme, the burst follows the burst transferrequest after a predetermined time period, presumably sufficient toschedule the burst transfer across the core, and it is expected that,when the burst arrives at the core node, the optical switch will havebeen properly configured by a controller of a core node. It is notedthat even if a very long time gap is kept between a burst-transferrequest and the data burst itself, the lack of buffers at the core nodemay result in burst loss and a significant idle time.

In the closed-loop scheme, the time delay involved in sending a bursttransfer request and receiving an acceptance before sending a burst maybe unacceptably high, leading to idle waiting periods and low networkutilization in addition to requiring large storage at the edge nodes.

In the open-loop scheme, a burst may arrive at a core node before theoptical switch can be configured to switch the burst and the burst maybe lost. Furthermore, the fact that the burst has been lost at the corenode remains unknown to the source node for some time and a lost burstwould have to be sent again after a predefined interval of time.

In a wide-coverage network, the round-trip propagation delay from anedge node, comprising a paired source node and a sink node, to a corenode can be of the order of tens of milliseconds. This rendersclosed-loop burst scheduling inappropriate. In closed-loop switching, asource node and a core node must exchange messages to determine thetransmission time of each burst. The high round-trip delay requires thatthe source node have a sizeable buffer storage. On the other hand,open-loop burst scheduling, which overcomes the delay problem, canresult in substantial burst loss due to unresolved contention at thecore nodes. It is desirable that data bursts formation at the sourcenodes and subsequent transfer to respective optical core nodes beperformed with low delay, and that burst transfer across the core bestrictly loss-free. It is also desirable that the processing effort andtransport overhead be negligibly small.

A burst scheduling method and a mechanism for burst transfer in acomposite-star network is described in the applicant's U.S. patentapplication Ser. No. 09/750,071, filed on Dec. 29, 2000, and titled“Burst Switching in a High-Capacity Network”, the contents of which areincorporated herein by reference. According to the method, aburst-transfer request is sent to a controller of a core node after aburst has been formed at a source node. High efficiency is, however,maintained by burst scheduling and burst-transfer pipelining. The bursttransfer across the optical-core is loss-free. However, a burst has towait at its source node for a period of time slightly exceeding around-trip delay between the source node and a selected core node. In anetwork of global coverage, the burst-transfer latency may exceed a highvalue, 20 milliseconds for example, for a significant proportion of thetraffic.

SUMMARY OF THE INVENTION

The In a network having electronic edge nodes and optical core nodes,each core node has a capability to switch data bursts of variable sizes.The data bursts received at a core node are generated at source nodesgenerally having substantially different propagation delays to the corenode and the present invention provides a burst-switching method andapparatus to enable a high-performance burst-switching mode.

In accordance with an aspect of the present invention, a method isprovided for burst communications wherein a core node of the networkdistributes timed burst transfer permits to edge nodes, and each edgenode assembles data into bursts as indicated by respective permits andtransmits the bursts according to the permits schedule. In a relatedaspect, the burst sizes and burst transfer rates are determined asfunctions of bitrate allocations for burst streams.

In accordance with another aspect of the present invention, a method isprovided for burst specification and scheduling wherein burst schedulesare initiated by a bufferless core node and distributed to respectiveedge nodes. In a related aspect, there is provided a method for burstswitching in which bursts are initiated by any of a plurality ofbufferless core nodes and distributed to respective edge nodes.

In accordance with a further aspect of the present invention there isprovided a method of burst generation wherein a burst size is determinedaccording to an allocated bitrate of a respective burst stream. In arelated aspect, an allocated bitrate of a burst stream is modifiedaccording to observed usage of scheduled bursts of said burst stream.

In accordance with yet another aspect of the present invention there isprovided a method of control-burst exchange between each of a pluralityof edge nodes and each of a plurality of bufferless core nodes. Bothpayload bursts and control bursts share the optical channels connectingthe edge nodes and the core nodes.

In accordance with a further aspect of the present invention there isprovided a method of time locking a source node to a core node in aburst-switching network. In a related aspect, control bursts includetiming data that are exchanged between a source node and a core node.

In accordance with a further aspect, there is provided a core nodehaving a plurality of optical switches, each optical switch including aplurality of input ports and a plurality of output ports, wherein saidcore node receives data traffic from each of a plurality of source nodesthrough a number of input ports of which at least one is operated in aburst mode. In a supplementary aspect, said number of input ports canbelong to any number of said plurality of optical switches.

BRIEF DESCRIPTION OF THE DRAWINGS

In the figures which illustrate example embodiments of this invention:

FIG. 1 illustrates a composite-star network for use with an embodimentof the present invention;

FIG. 2 illustrates a parallel-plane optical core node for use with anembodiment of the present invention;

FIG. 3 illustrates an optical switch with associated master controllerand slave controller for use with an embodiment of the presentinvention;

FIG. 4 illustrates the coexistence of channel and burst switching in theoptical switch illustrated in FIG. 3 for use with an embodiment of thepresent invention;

FIG. 5 illustrates the exchange of messages between an edge node and acore node in the network illustrated in FIG. 1 for burst-schedulegeneration, according to an embodiment of the present invention;

FIG. 6 illustrates the exchange of messages between an edge node and acore node in the network illustrated in FIG. 1 for burst-schedulegeneration, according to an embodiment of the present invention;

FIG. 7 illustrates the exchange of messages between two edge nodes and acore node in the network illustrated in FIG. 1 for burst-schedulegeneration, according to an embodiment of the present invention;

FIG. 8 illustrates the dependence of a preferred burst size on a bitrateallocation of a respective burst-stream, according to an embodiment ofthe present invention;

FIG. 9 is an example of preferred burst-sizes corresponding to differentbitrate allocations, according to an embodiment of the presentinvention;

FIG. 10 illustrates two upstream burst sequences sent by an edge node,the first sequence is sent under normal conditions and the secondsequence is sent during a time-locking recovery phase, in accordancewith one of the embodiments of the present invention;

FIG. 11 illustrates two main control elements, specifically atime-locking circuit and a master burst scheduler, within the mastercontroller of an optical space switch, according to an embodiment of thepresent invention;

FIG. 12 illustrates a time-counter period, a reconfiguration period, anda schedule period, according to an embodiment of the present invention;

FIG. 13 illustrates an upstream control burst, according to anembodiment of the present invention;

FIG. 14 illustrates a downstream control burst, according to anembodiment of the present invention;

FIG. 15 is a flow chart illustrating the main steps of time-lockingrecovery, in accordance with one of the embodiments of the presentinvention;

FIG. 16 illustrates an alternative arrangement for initiating andrecovering time-locking between edge nodes and an optical switch in acore node, in accordance with one of the embodiments of the presentinvention;

FIG. 17 illustrates an implementation of the arrangement of FIG. 16;

FIG. 18 illustrates the temporal arrangement of upstream and downstreamcontrol bursts in optical channels, according to an embodiment of thepresent invention;

FIG. 19 illustrates the relative position of a timing control burstwithin a time-counter cycle, according to an embodiment of the presentinvention;

FIG. 20 illustrates an optical node having four optical switches wheresome input ports in each optical switch are operated in achannel-switching mode and others are operated in a burst-switchingmode, according to an embodiment of the present invention;

FIG. 21 illustrates a device for generating burst descriptors ofbitrate-regulated burst streams associated with a plurality of sourcenodes for use with an embodiment of the present invention;

FIG. 22 illustrates a master burst scheduler, including aburst-scheduling kernel, a burst-descriptor memory, an input-statememory, an output-state memory, and a permits buffer, according to anembodiment of the present invention;

FIG. 23 illustrates an enhanced master burst scheduler where severalburst-descriptor memories and several output-state memories are used tospeed-up the scheduling process, according to an embodiment of thepresent invention;

FIG. 24 illustrates further details of the enhanced master burstscheduler of FIG. 23, according to an embodiment of the presentinvention;

FIG. 25 illustrates input-state and output-state arrays for use with anembodiment of the present invention;

FIG. 26 illustrates a method for scaling a burst scheduler, inaccordance with an embodiment of the present invention;

FIG. 27 illustrates an alternative method for scaling a burst scheduler,in accordance with an embodiment of the present invention;

FIG. 28 illustrates front-end burst scheduling (FIG. 28 a) andtrailing-end burst scheduling (FIG. 28 b) in a time-slotted frame foruse with an embodiment of the present invention;

FIG. 29 illustrates a source node and a sink node for use with anembodiment of the present invention;

FIG. 30 illustrates an edge node comprising a source node and a sinknode that share a common switching fabric for use with an embodiment ofthe present invention;

FIG. 31 illustrates an apparatus for burst formation, including anenqueueing controller, a dequeueing controller, memory devices, and aburst-transfer scheduler for use with an embodiment of the presentinvention;

FIG. 32 illustrates the organization of the memory devices of FIG. 31;

FIG. 33 is a flow chart describing the functional steps of packetconcatenation at an output port of a source node to form data bursts foruse with an embodiment of the present invention; and

FIG. 34 is a flow chart showing the steps leading to the transfer ofbursts from a source node for use with an embodiment of the presentinvention.

DETAILED DESCRIPTION OF THE INVENTION

A star network's main attraction is its high performance and simplicityof control. However, it is suitable only for limited geographic ortopological coverage. A composite star network 100, illustrated in FIG.1, may be viewed as a superposition of several star networks which aremerged only at the edge nodes 120 while the core nodes 140 can be widelydistributed and independent. An edge node 120 comprises a source node120A and an associated sink node 120-B. Hereinafter, reference to anedge node 120 also implies reference to the source node 120A and thesink node 120B that constitute the edge node 120. Similarly, referenceto a source node 120A or a sink node 120B implies reference to the edgenode 120 to which either belongs.

The core nodes 140 of a composite-star network are not connected to eachother. The composite-star network 100 retains the attractive propertiesof a star network while providing a wide geographic and topologicalcoverage. The composite-star network 100 will be used for the purpose ofdescribing embodiments of the present invention. A star network istreated as a component of a composite-star network. Unless otherwisestated, reference to a connection from a source node to a sink nodeexcludes an internal connection within an edge node, i.e., from a sourcenode to its associated sink node. Hereinafter, an upstream data burst isdefined as a burst sent from a source node to a core node, and adownstream data burst is a burst sent from a core node to a sink node.Likewise, the flow of data bursts from a source node to a core node iscalled a burst upstream and the flow of data bursts from a core node toa sink node is called a burst downstream.

Hereinafter, any two edge nodes are said to constitute a node pair. Anode pair is directed so that data traffic flows from the source node120A of a first edge node 120 to the sink node 120B of a second edgenode. The term node-pair traffic refers to the total traffic demand,expressed in bits per second, that a first edge node (source node)intends to transfer to a second edge node (sink node). A burst stream isdefined by a source node 120A, a sink node 120B, and a path from saidsource node 120A to said sink node 120B. A burst stream, from a sourcenode to a sink node comprises a burst upstream and a burst downstream.Where the burst traffic from a source node 120A of a first edge node 120is transferred to a sink node 120B of a second edge node 120B throughtwo or more paths, each of said two or more paths defines a separateburst stream. The node-pair burst traffic from a source node 120A to asink node 120B can be divided into multiple burst streams due to thevacancy distribution in a plurality of paths or if the bitraterequirement of said burst traffic exceeds the capacity of a single path.

Each burst stream may comprise several individual connections ofdifferent bitrate requirements. Each connection is defined by a datasource served by a source node 120A and a data sink served by a sinknode 120B. The connections within a bust stream may have distinctlydifferent bitrate and service requirements.

The spectral capacity (the bandwidth) of an optical fiber link can bedivided into channels each corresponding to a modulated carrierwavelength. For brevity, a carrier wavelength is often referenced simplyas a wavelength. A channel may have a capacity of 10 Gb/s for example. Amodulated wavelength gives rise to a channel. A channel occupies aspectral band, however, it is customary to also refer to a channelsimply as a wavelength.

The preferred core node 140 of a composite-star network 100 comprisesparallel space switches 220, as illustrated in FIG. 2. A space switch220 has a bufferless fabric which may be electronic or photonic. Thecore node 140 switches channels of upstream WDM links 210 to channels ofdownstream WDM links 230. Each optical switch 220 is operated to switchchannels of the same wavelength. A data burst from a source node 120A toa sink node 120B may be transferred through any optical switch 220 inany core node 140 connecting the source node to the sink node.Hereinafter, the terms optical switch and optical space switch are usedinterchangeably.

It is noted that conventional WDM demultiplexers 212 and WDMmultiplexers 226 need be used at the input and output of eachmulti-plane core node. They are not further described, their use beingwell-known in the art.

There are several core nodes 140 in the network of FIG. 1, and the corenodes operate totally independently. The parallel optical switches 220in the core node 140 of FIG. 2 also operate independently. Initially,each source node 120A selects at least one of the core nodes 140 throughwhich traffic destined to a given sink node 120B is routed. To select apath to a destination sink node 120B, a source node 120A selects a corenode for a connection in such a way that promotes load balancing whiletaking into account the propagation delay of the path. A composite indexcalculated as a function of both a path vacancy and the path'spropagation delay can be used to distribute the traffic load.

The traffic directed to a specific sink node 120B may be carried by anyof the channels of the multi-channel link 210 (WDM fiber link) from thesource node 120A to the selected core node 140. A load-balancingalgorithm to balance the traffic load among the links 210 and 230 can beused to increase the throughput. Successive bursts to the same sink node120B may use different channels (different wavelengths), and hence beswitched in different optical switches 220 in a core node 140. It ispreferable, however, to distribute burst-switched connections evenlyamong optical switches 220 of an optical core node 140 in such a waythat the bursts of each connection use the same optical switch 220.

In a prior art burst-scheduling process, a controller of an opticalswitch receives burst descriptor from the source nodes and schedules theburst switching times. In a distinct departure, according to anembodiment of the present invention, the burst descriptors are generatedby a master controller 240 of an optical switch 220, the switching timesof the corresponding bursts are scheduled, and edge-node-specificburst-transfer permits are distributed to the respective edge nodes 120.The burst-descriptor generation is based on burst-streambitrate-allocation defined by the source nodes 120A. A source node 120Adetermines the bitrate requirement for burst streams either according toexplicit specification by the traffic sources or by an adaptive meansbased on monitoring usage and/or observing the occupancy fluctuation ofdata burst buffers.

FIG. 3 illustrates a space switch having N input ports 314 and N outputports 384, N>1. This represents one of the optical switches 220 of themultiple-plane optical core node 140 of FIG. 2. Each input port 314 hasa receiver and each output port 384 has a transmitter. The input ports314 receive data from source nodes (not illustrated) through incomingWDM links 210, which are demultiplexed into channels 214, and the outputports 384 transmit data to sink nodes (not illustrated) through channels224. The interconnection of input ports 314 to output ports 384 iseffected by a slave controller 250 associated with the optical switch220. A master controller 240 determines the connectivity pattern ofinput ports 314 to output ports 384 and communicates the connectivitypattern to a slave controller 250. Each source node 120A has at leastone time counter and the master controller 240 has a master timecounter. All time counters have the same period of the master timecounter. Both the master controller 240 and slave controller 250 arepredominantly hardware operated to realize high-speed control. In a corenode 140 having several optical switches 220, as illustrated in FIG. 2,preferably each optical switch should have its own master controller 240and slave controller 250. Also, as will be described later withreference to time-locking requirements, a source node 120A may betime-locked separately to each of the plurality of optical switches 220,because of the different propagation delays experienced by channels ofdifferent wavelengths in a link 210 connecting a source node 120A to acore node 140.

Each input port 314 has a receiver operable to receive an optical signalfrom an optical channel and each output port 384 has a transmitter whichis operable to transmit an optical signal through an optical channel.The N input ports 314 of an optical switch 220 can simultaneouslyreceive N optical signals and the N output ports 384 of an opticalswitch 220 can simultaneously transmit N optical signals.

The optical switch 220 has input ports 314 labeled A₀ to AN and outputports 384 labeled B₀ to BN where input port Ao is a control input portand output port Bo is a control output port while the rest of the portsA₁ to AN and B₁ to BN are payload ports. The master controller sendscontrol messages to any of output ports B₁ to BN through an E/O(electrical-to-optical) interface 316, control input port A₀ and theoptical switch 220. The master controller receives control messages frominput ports A₁ to AN through the optical switch 220, control output portBo and an OlE (optical-to-electrical) interface 386.

Data bursts are received from any upstream link 210, each data burst isdestined to a specified output port Bx, 1×N. Some bursts, hereinaftercalled control bursts, are destined to the master controller 240. Thecontrol bursts carried by the N incoming channels 214 are staggered sothat the master controller 240 receives, through control output port B0,the content of each control burst one at a time. The control bursts arepreferably of equal size. It is noted that the upstream control burstsconstitute one of the burst streams for which a bitrate is allocated. Acontrol burst is likely to be much shorter than a typical payload burst.

FIG. 4 illustrates the space switch of FIG. 3 with channel switchingapplied to some pairs of input and output ports 314/384 and burstswitching applied to the other input-output pairs 314/384. The node-pairbitrate requirements received at a core node 140 may have a largevariance where a node pair may require a capacity of several channelswhile another node pair may require a small fraction of the capacity ofa channel. The bitrate requirements may also change considerably withtime. It is preferable, therefore, to establish a mixture of channelpaths and burst paths within the same optical switch and to providemeans, at respective edge nodes 120, for rapidly modifying the paths'granularities, from burst-to-channel or vice versa, as the trafficpattern changes. Although all input ports 314 can be identical, an inputport 314 through which a channel is switched to an output port in aunicast transfer, or multiple output ports in a multicast transfer, iscalled a channel-mode input port and an input port 314 through whichindividual bursts are switched to a plurality of output ports is calleda burst-mode input port.

A master controller 240 of one of the optical switches 220 of a corenode 140 is designated to function as a core-node controller 240A, inaddition to its function as a master controller for its optical switch220. The core-node controller 240A collects all the bitrate-allocationrequests from all source nodes 120A to which the core node 140 isconnected and produces a bitrate allocation matrix, having N×N entries,that contains all the bitrate requirements from source nodes 120A tosink nodes 120B. Each row in the matrix corresponds to a source node,each column corresponds to a sink node, and the sum of any column in thematrix must not exceed the capacity of the paths from the core node tothe corresponding sink node. Satisfying this condition may result inadjusting or rejecting some of the bitrate allocation requests as willbe described below. The selection of entries to be adjusted or rejectedis a matter of network-management policy.

The master controllers 240 of the optical switches 220 of a given corenode 140 are interconnected by an internal bus (not illustrated). Eachmaster controller 240 has at least one dual port 221 (FIG. 2) thatincludes a sender and a receiver to enable communications with othermaster controllers through said internal bus. In a given core node 140,the master controller 240 designated as a core-node controller 240Areceives the bitrate-allocation requests from each edge node 120 thatconnects to the core node 140.

Each source node 120A determines the required bitrate allocation for itstraffic destined to each sink node 120A, selects a core node 140, andsends a bitrate-allocation request to the core-node controller 240A, ofthe selected core node 140, which verifies the availability or otherwiseof paths having a sufficient vacancy to accommodate the required bitrateand sends a reply to the edge node. A path between a source node 120Aand a sink node 120B is defined by a selected space switch 220 in aselected core node 140. A core-node controller 240A may divide thebitrate requirement of a node pair among several space switches 220 ofthe core node 140. If the bit-rate allocation request is accepted, thereply includes, directly or indirectly, the identity of the space switch220 selected to define a burst stream to the destination sink node.

The core-node controller 240A performs the function of admission controlby ensuring that the total bitrate allocation for each output port 384in each of the optical switches 220 of the core node 140 does not exceedthe capacity of the output port 384 or the capacity of the downstreamchannel 224 emanating from the output port 384. The core-node controller240A selects at least one optical switch 220 then communicates bitrateallocations to respective master controllers 240.

The bitrate allocations of each master controller 240 are used togenerate burst descriptors. A burst descriptor includes a burst size andan inter-burst interval. Both the burst-size and the inter-burstinterval are determined according to the required bitrate allocation.The generated burst descriptors are placed in a buffer where they waitto be scheduled for switching as will be described with reference toFIGS. 22 to 24. A scheduling algorithm is exercised at a mastercontroller 240 of an optical switch 220 to determine the time at whicheach burst must be received at its respective input port in the opticalswitch 220. With time-locking, as will be described in detail below, anindication of the relative time at which the start of a burst isreceived at a specific port is identical to an indication of therelative time at which the start of the burst is transmitted from therespective source node 120A. The time schedules of the bursts over agiven interval, called the scheduling interval, are communicated torespective edge nodes 120. These are communicated in the form ofburst-transfer permits that are derived from the generated schedule. Theduration of the scheduling interval is dictated by the execution time ofthe scheduling algorithm used. The interval between successive schedulecomputations is called a reconfiguration interval. The minimumreconfiguration interval equals the scheduling interval. In order toreduce the processing effort, as will be described later with referenceto FIGS. 26 and 27, the reconfiguration interval may exceed, andpreferably be an integer multiple of, the scheduling interval.

FIG. 5 illustrates the message exchange between one of a plurality ofedge nodes 120 and a core node 140 in order to generateedge-node-specific burst-transfer permits. An edge node 120 sends avector having N entries, N being the number of ports of the opticalswitch 220, each entry corresponding to a sink node 120B and contains arequired bitrate allocation for the aggregate burst traffic from thesource node 120A to a respective sink node 120B through a core node 140.The edge node 120 ensures that the sum of the vector entries do notexceed the capacity of the paths from the source node to the core node.

The message exchange illustrated in FIG. 5 relates to a case where theedge nodes are collocated with a core node, thus forming a high-capacityburst-switch in which the propagation delays among edge nodes 120 andcore nodes 140 are negligible. Each edge node requests a bitrateallocation to other edge nodes. A requested bitrate allocation isgranted only if paths having a sufficient vacancy are found. An edgenode 120 sends a message 530 to a core node 140. The message 530 isembedded in an upstream control burst indicating a required bitrate aswill be described below with reference to FIG. 10 and FIG. 13. The corenode 140 replies with a message 540 that includes burst-transfer permitsto be described below with reference to FIG. 14. Each edge-node-specificburst-transfer permit includes a burst size, a transfer time, and adestination sink node. The reply 540 follows the request message 530after a period of time that exceeds a scheduling period 580. Theduration of the scheduling period 580 is determined by the mastercontroller 240 of the optical switch 220 selected to route the burstdata.

In a distributed network, the edge nodes may be geographically dispersedwith varying propagation delays to the core node. FIG. 6 illustrates acase where there is a significant propagation delay between an edge node120 and a core node 140. The edge node 120 sends new bitrate-allocationrequests 530 periodically to a master controller 240 and the mastercontroller 240 sends burst-transfer permits 540 to the edge node 120.The requested bitrate allocations may be modified due to outputcontention at the optical switches 220. Due to the propagation delay,the upstream control bursts and downstream control bursts may beconcurrent as indicated in FIG. 6, where a request 530B and a reply 540Ato a previous request 530A propagate through the network simultaneously.

FIG. 7 illustrates the exchange of messages between a master controller240 and two edge nodes 120 in order to enable core reconfiguration. Theneed for core reconfiguration is preferably assessed periodically. Asindicated in FIG. 7, edge node 120 labeled E-1 sends a bitrate-requestvector to a core-node controller 240A of a core node 140. The bit-raterequest vector has one entry for each bitrate-allocation requestemanating from edge-node E-1.

As described above, the aggregate traffic for a node pair may be dividedinto several burst streams, and a burst stream may constitute severalconnections defined by a data source and a data sink. A data stream mayalso constitute several sub-streams distinguished by some property, suchas burstiness, or an attribute such as a service class. The number ofdata sub-streams may exceed the number of sink nodes, where several datasub-streams may be sent from edge node E-1 to a single sink node. Forthe purpose of illustrating the methods of the present invention, amaster controller 240 need not be aware of such a division and only theaggregate bitrate allocation requests from edge node E-1 to each outputport 384 of the optical switch 220 need be considered.

If the core-node controller 240A of a core node 140 decides to allocatea bitrate lower than the bitrate requested by a node pair, it is theduty of the edge node 120 to determine which of a plurality ofindividual connections that constitute the aggregate node-pair trafficshould be affected. Similarly, an edge node E-2 sends itsbitrate-request vector to the master controller 240.

The timing of sending the bitrate-request vectors from each of theplurality of edge nodes (source nodes) should be coordinated so that allthe requests arrive at the master controller before the start of thereconfiguration process by a relatively short time, as illustrated inFIG. 7. This would ensure that the reconfiguration, i.e., the generationof new burst-transfer permits, is conducted according to the most recentbitrate requests. In order to realize this coordination, each edge node(E-1, E-2, etc.) must be time-locked to the optical switch 220, as willbe detailed below in conjunction with FIGS. 10-13, and the core-nodecontroller 240A must send to each edge node a time-counter reading atwhich all edge nodes should start sending their bitrate-allocationrequests.

To produce edge-node-specific burst-transfer permits, the generatedburst descriptors need be scheduled. The scheduler at a mastercontroller 240 of an optical switch 220 in an optical core 140 processesthe bitrate allocations, as determined by the core-node controller 240A,at the beginning of each schedule-computation period. In order to basethe schedule on the most recent bitrate-allocation requests, each sourcenode 120A should set the time of transmitting its bitrate-allocationrequest vector so that it would arrive at the core node 140 shortly (afew microseconds) before the start of the schedule-computation period.

Burst Formation

The packet data at each output port (not illustrated) of a source nodeare sorted into queues according to destination sink nodes and thepacket data of each queue are aggregated into bursts as will bedescribed below with reference to FIG. 31 and FIG. 32.

A burst-formation period (burst-formation delay) is defined hereinafteras the time required to assemble a burst at a queue in an output port ofthe source node 120A where data is dequeued at a speed specific to thequeue. The channel-access delay is the time required to transmit a burstthrough an optical channel.

FIG. 8 illustrates the relation between the preferred burst size and thebitrate of a burst stream. An upper bound 832 of a burst size isselected to avoid high delay in accessing an optical channel 214 from anoutput port of a source node 120A to an optical switch 220 in theoptical core node 140. Selecting a maximum burst duration in an opticalchannel of a nominal capacity of 10 Gb/s to be 32 microseconds, forexample, yields a maximum burst size of 320 kilobits (40 kilobytes). Theburst duration is limited in order to limit the delay jitter. At asource node 120A, a burst is formed at an output port (not illustrated)where data is sorted into queues each of which corresponding to adestination sink node. With a combined bitrate of all data at an outputport of 10 Gb/s, for example, the bitrate allocation for a specificqueue may vary between zero and 10 Gb/s. For a queue allocated a bitrateof r bits/second, a burst size b, would require a burst formationperiod, d=b/r. With b=320,000 bits and r=1 megabits/second, theburst-formation period would be 320 milliseconds, which is consideredexcessive. If the permissible maximum burst formation period,hereinafter denoted D₀, is selected to be 1 millisecond, then the burstsize, b, should not exceed 1000 bits (b=r×Do). With a 10 Gb/s opticalchannel 214, the channel-access duration of a 1000-bit burst is only 0.1microseconds, which may be too small considering the switching latencywithin the optical switch 220 and potential timing imperfection in theprocess of time-coordination of a source node 120A and an optical switch220, as will be described in more detail below. A more appropriateminimum burst size 822 would be 10 kilobits, which corresponds to achannel-access duration of one microsecond, for a 10 Gb/s channel.Selecting an upper bound of the burst-formation period to be onemillisecond, the burst size for a burst stream allocated 8 Gb/s, forexample, would be limited to b=8 megabits. This corresponds to achannel-access duration of 800 microseconds, for channel speed of 10Gb/s. Such a high channel-access duration may result in delay jitter, asis well known from simple queuing analysis.

The selection of the upper bound Do of burst-formation delay can bedetermined according to a specified class of service. For example, thevalue of Do may be 10 milliseconds for a delay-tolerant burst stream but0.5 milliseconds for a delay-sensitive burst stream. The value of Doinfluences the selection of burst-size as described above.

Thus, the minimum burst size 822 should be selected so that a burst'soptical-channel access duration is larger than a threshold D₁, which isselected to be an order of magnitude larger than the sum of switchinglatency in the optical switch 220 and timing error where a signalarrival time deviates from a designated arrival time at a core node. Theselection of D₁ is also influenced by the need to reduce processingeffort. The maximum burst size should be selected so as not to result inexceeding a specified upper bound, D₂, of the optical-channel accessduration, or an upper bound, Do, of the burst-formation period. Areasonable value for D₂ would be 32 microseconds. It is noted that Do isallowed to be much higher than D₂ because the formation delay of a burstdoes not affect other bursts while a large D₂ causes delay jitter tosubsequent bursts. Delay jitter occurs when a burst waiting in a queueat an input of a channel has to wait for a large period of time foranother burst accessing the channel. FIG. 8 indicates the preferableburst sizes for two cases 826A and 826B where in one case, 826A, theupper bound, Do, of the burst-formation period is assigned onemillisecond and in the other case, 826B, it is assigned twomilliseconds, with D₁=1 microsecond and D₂=8 microseconds in both cases.A large burst-formation period generally increases the mean burst size,and, hence, increases the buffer-size requirement at a source node. Onthe other hand, a large mean burst size reduces the transport overheadand the processing effort.

In summary, at a source node 120A, a burst size has a lower limit 822determined by a prescribed minimum burst duration D1 in the opticalchannel connecting the source node to the core node, and an upper limit832 determined by either a permissible burst-formation delay Do or apermissible maximum burst duration D₂ in the optical channel connectinga source node to the core node.

Denoting the lower-bound and upper-bound of the burst size, b, as B1 andB₂ respectively, i.e., B1::; b::; B2, then B1=R×D1, B2=R×D2 and theallocated bitrate r for a burst stream must exceed a lower bound: rR×D1I D0, R being the channel capacity in bits per second.

Consider, for example the case where R=10 Gb/s, D₀=1 millisecond, D1=1microsecond, D₂=32 microseconds, and a specified r=1 Mb/s. The value ofr must be selected to be at least equal to R×D1/Do=10 Mb/s. Thus, tomeet the formation delay upper bound, a queue cannot be served at abitrate less than 10 Mb/s. If the value of Do is set equal to 10milliseconds instead of 1 millisecond, then a value of r=1 Mb/s would bepermissible. The permissible burst size then lies between 10 kilobitsand 320 kilobits.

FIG. 9 illustrates an example of burst-size calculation. Thebitrate-allocation requirements are represented by an N×N matrix, Nbeing the number of edge nodes 120. The computed burst sizes arerepresented by an N×N matrix. Corresponding sub-matrices are illustratedin FIG. 9. The sub-matrix 920 containing bitrate allocations 922 for asubset of node pairs shows a wide variance of bitrate-allocationrequests, with values ranging from 2 Mb/s to 3218 Mb/s. In this example,the permissible burst-formation delay Do is set equal to 2 milliseconds,the minimum burst duration, D1, and a maximum burst duration, D₂, areset at 1.6 and 32 microseconds, respectively, and the capacity (speed)of the optical channel is 10 Gb/s. This results in a minimum burst sizeB1 of 2 kilobytes and a maximum burst size B₂ of 40 Kilobytes. It isnoted that, under the constraint of the maximum burst formation delay of2 milliseconds, a bitrate of 2 Mb/s would result in a burst size of only500 bytes and a bitrate of 3218 Mb/s would result in a burst size ofabout 800 Kilobytes. With the D₁ and D₂ constraints, these sizes areadjusted to 2 kilobytes and 40 kilobytes respectively. The burst sizescorresponding to the bitrate allocations of sub-matrix 920 are given insub-matrix 980.

Time-Locking in a Burst-Switching Composite-Star Network

In a wide-coverage network comprising electronic edge nodesinterconnected by bufferless core nodes, where each edge node comprisesa source node and a sink node, both sharing an edge-node controller andhaving means for data storage and managing data buffers, the transfer ofdata bursts from source nodes to sink nodes via the core nodes requiresprecise time coordination to prevent contention at the bufferless corenodes. A core node preferably comprises a plurality of optical switcheseach of which may switch entire channels or individual bursts.

As described earlier, a first node X is said to be time locked to asecond node Y along a given path, if, at any instant of time, thereading of a time-counter at node X equals the sum of a reading of anidentical time-counter at node Y and the propagation time, normalized tothe time-counter period, along the given path from node X to node Y,where the time counters at nodes X and Y have the same period. There maybe several paths connecting the first node to the second node, and thepaths may be defined by individual wavelengths in a fiber link orseveral fiber links. Due to the difference in propagation delays ofdifferent paths connecting the same node pair, time locking may berealized for the different paths individually. Due to dispersion, timelocking of individual paths may be required even for paths defined bywavelengths in the same fiber link. When a first node is time locked toa second node along a given path, said given path is said to betime-locked.

In order to be able to switch bursts arriving at a core node 140 fromdifferent source nodes 120A having different propagation delays to thecore nodes, without contention or the need for burst storage at the corenode 140, the edge nodes 120 must be time-locked to each optical switch220 at a core node 140. A time-locking technique, also calledtime-coordination, is described in applicant's U.S. patent applicationSer. No. 09/286,431, filed on Apr. 16, 1999, and titled SELF-CONFIGURINGDISTRIBUTED SWITCH, the specification of which is incorporated herein byreference. With time locking, the scheduling method in accordance withthe present invention guarantees that bursts arrive to already freerespective input-output ports of the optical switch 220. Thetime-locking in application Ser. No. 09/286,431 referenced above usespre-assigned optical channels. In the present application, the method isadapted to burst-switching mode.

Each source node has at least one time counter and each core node has atleast one time counter. All time counters have the same period andtime-coordination can be realized through an exchange of time-counterreadings between each source node and its adjacent core node, i.e., thecore node to which the source node is connected. The time-counterreadings are carried in-band, alongside payload data bursts destined tosink nodes, and each must be timed to arrive at a corresponding corenode during a designated time interval. The difficulty of securingtime-coordination arises from two interdependent requirements. The firstis that communicating a time-counter reading from a controller of asource node to a controller of a core node requires that the source nodebe time-locked to the core node, and the second is that time-locking asource node to a core node necessitates that a controller of the corenode be able to receive a time-counter reading from the source-nodecontroller during a designated interval of time. To initiate or restoretime locking, a secondary mechanism is therefore required for directingupstream signals received from source nodes toward said mastercontroller.

Ina network where the edge nodes 120 and the core nodes 140 arecollocated in a relatively small area, the propagation delay between anyedge node 120 and a core node 140 can be substantially equalized, byequalizing the lengths of fiber links for example. In a network of widegeographic coverage, each edge node must adaptively time lock to thecore nodes to which it connects. Time locking enables conflict-freeswitching at a bufferless core node 140 of data bursts transmitted by aplurality of edge nodes 120 having widely varying propagation delays tothe bufferless core node 140.

FIG. 10 illustrates a burst stream 1012 sent by an edge node 120 undernormal operation. The burst stream comprises upstream control bursts1020, one of which is indicated, and payload data bursts 1040, generallyof different sizes. The bursts are formed by a source node 120Aaccording to burst-transfer permits said source node receives after apredefined reconfiguration interval. As described with reference to FIG.12, a new burst transfer schedule may be generated during eachreconfiguration interval. An upstream control burst 1020 generallycontains timing data as well as other control data and it includes thebitrate-allocation requests 530 described with reference to FIGS. 5 to7. The size of the timing data would typically be much smaller than thesize of the other control data carried by a control burst. During atime-locking recovery phase, the edge node 120 sends only a continuousstream 1014 of control bursts 1022.

Due to loss of time coordination, an upstream control burst is naturallyshortened because it includes only timing data, and the duration of anupstream control burst would be less than half the time intervaldesignated for receiving a control burst at control output port B0.Thus, as indicated in FIG. 10, a control burst 1022, which is shorterthan control burst 1020, can be acquired. It is noted that thistime-locking acquisition method allows optical signals from severalinput ports to be processed in successive time slots allocated tocontrol bursts. During a period of time equal to the duration of anupstream control burst 1020, control output port Bo (FIG. 3) receivesand acquires at least one complete shortened upstream control burst1022, as indicated in FIG. 10 for shortened control burst 1022A.

FIG. 11 shows control components of a master controller 240. The maintwo components are a time-locking circuit 1160 and a master burstscheduler 1170. A control burst, which contains timing data is scheduledlike any other burst. The master burst scheduler 1170 is described belowwith reference to FIGS. 21 to 24.

The master controller 240 of an optical switch 220 includes a mastertime counter. The period of the master time counter is hereinaftercalled a master cycle. Each edge node also has a time counter that hasthe same period of the master cycle.

The edge nodes 120 communicating with optical switch 220 in a core node140 are time-locked to the master time counter of the optical switch220. The burst-transfer schedules transmitted by the optical-switchmaster controllers 240 to the edge nodes 120 must be based on the timeindication of the master time counter. The schedule period must,therefore, be locked to the master time counter. The selection of themaster cycle period and the schedule period are important designchoices. As described earlier, the master cycle period exceeds theround-trip propagation delay between any two edge nodes 120. Thus, themaximum round-trip propagation delay dictates the master-cycle duration.In determining a lower bound of the master cycle duration, a timeperiod, of one millisecond or so, would be added to the maximumround-trip propagation delay to account for other delays along around-trip path. With a time counter of W bits, the duration of thetime-counter cycle is 2w multiplied by a clock period. With W=32, and aclock period of 16 nanoseconds, for example, the number of counterstates is about 4.29 billion and the time counter period is more than 68seconds. This is orders of magnitude higher than the round-trippropagation delay between any two edge nodes 120.

The master controller includes a detector operative to detect loss oftime locking of any upstream optical signal and secondary means forinitiating and recovering time locking. In one implementation, saidsecondary means includes a device for sampling a succession of timingdata delivered to the master controller through said space switch, aswill be described with reference to FIG. 15. In another implementation,said secondary means includes a controller switch that diverts anupstream optical signal away from said space switch and towards themaster controller, as will be described with reference to FIGS. 16 and17.

A time-counter cycle is standardized across the network 100 so that eachtime counter, whether it resides at an edge node 120 or a core node 140,has the same wordlength (number of bits) and all are driven at the sameclock rate. Some variation of the clock rate and wordlength can beaccommodated.

The schedule period must exceed the duration of the longest burstreceived at a core node. In order to simplify time coordination betweena core node and an edge node, it is preferable that a time-counter cycleperiod (master cycle period) be an integer multiple J of the scheduleperiod. Furthermore, it is preferable that the integer multiple J be apower of two.

FIG. 12 depicts a master-cycle period 1210, a reconfiguration period1220, and a schedule period 1230 for an exemplary case of a master-cycleperiod that is exactly four times a reconfiguration period, and thereconfiguration period is exactly four times the schedule period.

As described above, the master-cycle period must exceed the round-tripdelay between any two edge-nodes. Preferably, the master-cycle periodshould be of the order of one second, and the reconfiguration period ispreferably of the order of 100 milliseconds. The reconfiguration periodmust be sufficient to compute a burst-transfer schedule corresponding toa designated burst-transfer period. For an optical switch having a largenumber of nodes, the computation period 580 (FIG. 5) of a burst-transferschedule may significantly exceed the designated schedule period. Thereconfiguration period 1220 exceeds the period 580 and is selected to bean integer multiple, preferably a power of 2, of the designated scheduleperiod. For example, if the schedule period 1230 is selected to be 2milliseconds and it is estimated that the computation period 580 (FIGS.5 to 7) is 11 milliseconds, i.e., 5.5 times the schedule period, thenthe reconfiguration period 1220 must be selected to be at least 12milliseconds and the preferred reconfiguration period is 16 milliseconds(8 times the schedule period). Time alignment of the schedule cycle andthe master cycle is essential as indicated in FIG. 12. The number ofschedule periods per reconfiguration period and the number ofreconfiguration periods per master-cycle period are design options.

The alignment of the reconfiguration cycles with the master cycle isrealized by selecting the master-cycle period to be an integer multipleof the reconfiguration period. The alignment is further simplified ifsaid integer multiple is a power of 2. For example, if the period of themaster cycle is represented by W bits and the reconfiguration period isrepresented by V bits, V<W, then each reconfiguration cycle should startwhen the least-significant V bits of the master counter become allzeros.

Each output port of a source node 120A has a time counter, and the timecounters of the output ports of a given source node 120A areindependently time locked to respective optical switches 220 and, hence,may have different readings at any instant of time. Thus, the start timeof a time counter in a source node 120A is output-port specific andadapts to an associated space switch 220. All time counters in theentire network 100 have the same period.

An upstream control burst 1020 sent from an output port of a source node120A to an optical switch 220 is illustrated in FIG. 13. The upstreamcontrol burst 1020 may have several purposes such as conveying timingdata and bitrate allocation requests. The upstream control burst 1020includes a conventional preamble 1302, typically of several bytes, to beused for message identification and acquisition, followed a field 1304that defines the purpose of the burst 1020. Field 1304 is preferably4-bit wide, thus identifying 16 different functions of the upstreamcontrol burst 1020. Field 1306 contains a cyclic serial number which canbe used for verification and further control functions. This is followedby a field 1308 indicating the size of the control burst. Field 1308indicates the number K of subsequent bitrate-allocation requestsincluded within the upstream control burst 1020, each bitrate allocationrequest corresponds to a sink node 120B. Record 1310 has two fields 1312and 1314. Field 1312 is an identifier of an output port of the sourcenode. This would normally be the output port number in the respectivesource node 120A that formed the upstream control burst 1020. Field 1314is a time measurement determined as the reading of the time counter ofthe output port of the source node from which the upstream control burst1020 is sent to the optical switch 220. The K bitrate-allocationrequirements are organized in records 530 (see FIGS. 5, 6, and 7), whereeach record 530 corresponds to a destination sink node 120B. Each record530 contains three fields. A field 1322 contains an identifier of adestination sink node 120B, a field 1324 indicates a newbitrate-allocation requirement corresponding to the destinationindicated in field 1322, and a field 1326 indicates a class of service.The destination identifier in field 1322 may either be associated with acurrent bitrate-allocation request or be defining a new one. The bitrateallocation requests 530 are processed by a core-node controller 240A ofa core node. An upstream control burst 1020 that carriesbitrate-allocation requests 530 from a source node 120A is preferablysent directly to a core-node controller 240A. However, it can be sent tothe master controller 240 of any optical switch 220 of the core node 140because all the master controllers 240 of a core node 140, including theone functioning as a core-node controller 240A, are interconnected.

Each upstream control burst 1020 or 1022 must include fields 1302, 1308,1312, and 1314. An upstream control burst 1020 that is also used forbitrate allocations, and preferably communicated directly to a core-nodecontroller 240A of a core node 140, includes a number of bitrateallocation requests 530. As described earlier, each of the opticalswitches 220 of a core node 140 has a master controller 240 and adesignated master controller functions as a core-node controller 240Aand performs the bitrate-allocation control for all the space switches220 of the core node 140. Each master controller 240 has a means forrecording the reading of its own time-counter at the instant at which itreceives an upstream control burst 1020 or 1022.

FIG. 14 shows a format of a downstream control burst 1400 that a mastercontroller 240 sends to a sink node 120B in response to an upstreamcontrol burst 1020. The first field 1442 is a conventional preamble.Field 1446, preferably 4-bit wide, defines the function of thedownstream control burst 1400 which may carry timing data andburst-transfer permits, among other control data. The field 1448indicates the number L of scheduled bursts reported in the downstreamcontrol burst 1400. A record 1450 contains a timing response that has atleast three fields. The first field, 1452, contains an identifier of anoutput port of the source node associated with the upstream controlburst 1020. The second field, 1453 contains the schedule-period numberassociated with the control burst 1020. The third field 1454 containsthe time at which the upstream control burst 1020 was received at themaster controller 240 of optical switch 220. Each of the L records 540(FIGS. 5, 6, and 7) has three fields. The first field 1472 indicates aburst start time relative to the schedule period. The second field 1474indicates the burst length. The third field 1476 indicates the burstdestination sink node 120B. A fourth field 1478 is optional and may beused to indicate to an edge node 120 receiving a downstream controlburst 1400 an identifier of an optical switch 220 to which a burst is tobe directed. Note that there is a one-to-one correspondence between anoptical switch 220 and a port of the edge node 120. Field 1478 isoptional because a controller of an edge node 120 receiving thedownstream control burst 1400 can associate the input port at which theedge node 120 receives the downstream control burst with an opticalswitch 220 of a core node 140.

Node-Pair Time-Locking

The time-locking process in a time-shared network is described with thehelp of a two-node model. To realize time locking of a first node to asecond node in a network, the first node is provided with a firstcontroller that includes a first time counter and the second node isprovided with a slave controller and a master controller that includes amaster time counter. The second node has several input ports and outputports and the master controller is connected to one of the input portsand one of the output ports. The first controller sends an upstreamcontrol burst to an input port of said second node during a designatedtime interval, said upstream control burst including a reading of thefirst time counter. The upstream control burst is sent in-band, togetherwith payload data bursts destined to output ports of the second nodes.The slave controller must be able to direct said upstream control burstto said master controller during a pre-scheduled time interval. Themaster controller has a device for acquiring and parsing upstreamcontrol bursts. The master controller compares the reading of the firsttime counter with a reading of the master time counter. An agreement ofthe two readings, or a negligible discrepancy, ascertains timealignment.

In the absence of time alignment, a time-locking recovery procedure mustbe initiated. The master controller sends a downstream control burst tosaid first controller to indicate the absence of time alignment. Inresponse, the first node sends a succession of upstream control burstseach including a reading of said first time counter. Meanwhile, theslave controller directs a sample of said upstream control bursts tosaid master controller during a pre-scheduled time interval and themaster controller acquires at least one upstream control burst from saidsample and sends an identifier of an acquired upstream burst and acorresponding reading of the master time counter to the firstcontroller. The identifier may be a serial number of the upstream burst,or a reading of the first time counter included in the upstream controlburst. The first controller then resets the first time counteraccordingly to restore the required time locking. During this recoveryphase, the slave controller, which controls the connectivity of inputports to output ports of the second node, disconnects all paths to alloutput ports from the input port of the second node that connects to thefirst node.

The application of the time-locking process, described in the abovetwo-node model, to the network of FIG. 1 is described below. Each edgenode 120 assumes the role of the first node and each core node 140assumes the role of the second node. A core node 140 may have severaloptical switches 220, and an upstream WDM link 210 from a source node120A may switch burst streams through more than one optical switch 220.The source node 120A may lose its time-locking to one of the spaceswitches 220 while still being time locked to the remaining spaceswitches 220 of the core node 140.

Hereinafter, any mention of time-locking in a network of electronic edgenodes 120 and bufferless core nodes 140 each having a plurality of spaceswitches (optical switches) 220 implies time locking of a port of asource node 120A of an edge node 120 to a space switch (optical switch)220 of a core node 140.

Each scheduled control burst received at an optical switch 220corresponds to a source node 120A and the master controller 240 of saidoptical switch 220 parses the control burst to determine the source nodeand source node's time counter reading. In the notation usedhereinafter, an edge node 120, labeled Ex, connects to an input port Axand to an output port Bx of an optical switch 220, 1×N.

When the master controller 240 determines that the edge node Ex thatconnects to a port Ax is not time-locked to the optical switch, itinstructs the slave controller 250 to discontinue burst transfer frominput port Ax (314) to all output ports (384) B₁ to BN. The slavecontroller 250 continues to direct upstream control bursts 1020 receivedat port Ax to control output port Bo during designated time intervals.The master controller 240 also sends a downstream control burst 1400through input control port A₀ and output port Bx instructing edge nodeEx to send a continuous sequence of control bursts each including areading of the time-counter of edge node Ex.

During the periods scheduled for receiving, at control output port B0,upstream control bursts 1020 from edge node Ex, the master controller240 reads each control burst to acquire a time-counter reading (a timemeasurement) 1314 of a respective edge node. Once the time-counterreading 1314 from edge node Ex is detected, the master controller 240sends a corresponding reading 1454 of the master time counter to edgenode Ex. When the master controller 240 determines that edge node Ex istime locked to the master time counter, the master controller 240instructs edge node Ex to resume sending payload data bursts starting ata predefined instant of time in the master cycle, and the mastercontroller also instructs the slave controller to resume transferringdata bursts from input port Ex at a corresponding instant of time,typically the start of a subsequent master cycle.

The method described above is illustrated in the block diagram of FIG.15, which includes the main steps of time-locking acquisition for eachedge-core node pair. The master controller 240 receives an upstreamcontrol burst 1020 from each edge node 120 through control output portB₀ (FIG. 3) as indicated in step 1510 of FIG. 15. The control burst isparsed to acquire a timing message in record 1310 that includes anidentifier 1312 of an output port of an edge node 120 and the reading1314 of the time-counter of said edge node 120 as indicated in step1520. There is a one-to-one correspondence between an output port of asource node 120A connecting to the optical switch 220 and an input port314 of the optical switch 220. There is also a one-to-one correspondencebetween each output port 384 of the optical switch 220 and an input portof a sink node 120B connecting to the optical switch 220.

In step 1520, if the master controller 240 fails to acquire the timingmessage from an input port 314, as determined in step 1530, it initiatesa time-locking recovery process and control is transferred to step 1532.If the input port 314 is already in a recovery mode, as determined instep 1532, then control is transferred to step 1510 to process a controlburst from another input port 314. Otherwise, a time-locking recoveryprocess is initiated. This requires executing the two main steps 1540and 1550 to be described below, and the input port 314 through which theburst control message is received is marked as being in a recovery mode.Control is then transferred to step 1510.

In step 1520, if the master controller 240 succeeds in acquiring thetiming message, as determined in step 1530, then control is transferredto step 1560 where the master controller verifies the operational stateof the input port 314 through which the control burst has been received.If the input port 314 was operational in the previous verification, thennothing need be done and control is transferred to step 1510. If,however, the input port was marked as being in the recovery mode, i.e.,the input port 314 has just completed a recovery process, then, in step1570, the input port 314 is marked as operational and the mastercontroller 240 also instructs a respective edge node 120, in step 1570,to return to normal operation by sending payload data bursts and controlbursts according to current burst-transfer permits. In step 1580, themaster controller 240 instructs the slave controller 250 to restoreswitching from the recovered input port 314 to control output port Boand output ports B1 to BN.

In step 1540, the master controller 240 instructs the affected edge node120, i.e., the edge node connecting to the affected input port Ax of theoptical switch 220, to send a continuous stream 1014 (FIG. 10) ofupstream control bursts 1022, each including a cyclic serial number 1306and a timing message (record 1310 of FIG. 13). An upstream control burst1022 is a shortened form of an upstream control burst 1020. The number Kof bitrate allocation requests (FIG. 13) is zero and, hence, records 530are omitted. The serial number can be used to identify a correspondingreading 1314 of the time counter of the edge node. The duration of eachcontrol bursts should be less than half the time interval designated forreceiving a control burst as illustrated in FIG. 10. The affected edgenode then refrains from sending payload data bursts, i.e., bursts whichwould otherwise be directed to output ports B1 to BN, during therecovery phase.

In step 1550, the slave controller 250 starts a recovery process bydiscontinuing the transfer of bursts from the affected input port 314 tothe output ports Bt to BN. The affected input port 314 is switched tothe control output port Bo during a time interval specified by theswitching schedule of space switch 220. The signal received at controloutput port Bo during the time interval designated for the affectedinput port is now suspected to contain data other than the requiredtiming data. However, since the edge node 120 is now sending acontinuous stream 1014 of control bursts of appropriate width, themaster controller 240 can acquire at least one of the upstream controlbursts 1022, determine its serial number and the corresponding readingof the time counter of the edge node. The master controller then repliesto the affected edge node 120, indicating the serial number of thecontrol burst and the reading of the master time counter at the instantthe selected control burst was acquired. Alternatively, instead ofcommunicating a serial number of the control burst, the reply mayinclude the time-counter reading received from the edge node and thecorresponding reading of the master time counter of master controller240. The edge node 120 can then adjust its time counter according to thetiming data of the reply.

An alternate method of securing and maintaining time locking is toprovide an access stage to the optical switch. The access stage candivert an incoming channel directly to the master controller 240 undercertain conditions. FIG. 16 illustrates an optical switch having inputports Ao to AN and output ports Bo to BN where input port Ao is acontrol input port and output port Bo is a control output port. Themaster controller 240 sends downstream control bursts 1400 to any ofoutput ports Bt to BN through an E/0 interface 316, control input portA₀, and the optical switch 220, and the master controller 240 receivesupstream control bursts 1020 from input ports At to AN through a controlswitch 1610, the optical switch 220, control output port B0, and an 0/Einterface 386.

The control switch 1610 has N receiving ports At to AN and N sendingports 1612 connecting to N input ports of the optical switch. Thecontrol switch 1610 also has a number, n::::; N, of ports 1614connecting to the master controller through an 0/E interface 1650.Typically n is much smaller than N. The purpose of the control switch1610 is to selectively divert an optical signal received at any of portsAt to AN to the master controller 240. At most n such signals can bediverted simultaneously.

A master controller 240 of an optical switch 220 detects loss of timelocking of an edge node to the optical switch by comparing a receivedreading of a time counter of an output port of the edge node to thereading of a master time counter of master controller 240. The tworeadings should be identical, or be within an acceptable deviation fromeach other. When the master controller 240 determines that the sourcenode 120A of a signal received at a port Ax is not time-locked to theoptical switch 220, it instructs the control switch 1610 to divert thesignal to one of n input ports of the master controller. The mastercontroller 240 reads the signal to identify an upstream control burst1020 and, meanwhile, it sends a downstream control burst 1400 to theassociated sink node of said source node to indicate the loss oftime-locking. The downstream control burst 1400 is sent through the E/Ointerface 316, control input port A₀, the optical switch 220, and adownstream channel 224 from output port Bx. When the time-counterreading 1314 is detected, the master controller 240 sends the edge nodeEx a downstream control burst 1400 including a corresponding reading ofthe master time counter. When the master controller 240 determines thatthe edge node Ex is time locked to the master time counter, i.e., whenthe received reading of the time counter of the edge node equals thereading of the master time counter, or is within an acceptabletolerance, the master controller 240 instructs the control switch 1610to connect port Ax to the optical switch 220 and communication from edgenode Ex is restored. It is noted that the signals sent on link 1630 fromthe master controller 240 to a connectivity controller (not illustrated)of the collocated control switch 1610 are electrical signals.

FIG. 17 illustrates the time-locking arrangement of FIG. 16 with aspecific implementation of the control switch 1610. The control switch1610 includes a number, N, of 1:2 optical switches 1720 with N outputs1721 connecting to the input ports of the optical switch 220 and Noutputs 1722 connecting to an n: N selector 1740. As mentioned above,the number n of control ports connecting directly to the mastercontroller would be substantially less than N. For example, with N=256,two direct control ports (n=2) would suffice. In the event that morethan n source nodes lose time-locking to the master time counter, therecovery process described above can be applied sequentially.

The master controller 240 of the optical switch 220 creates a schedulefor receiving control bursts from each input port. According to theschedule, each of the source nodes 120A sending a burst stream to one ofthe input ports Ax must send control bursts at time instants indicatedin the schedule. In order to send the control bursts precisely at thetime determined from the schedule, each of the source nodes 120Aconnecting to a core node 140 must be time-locked to the specificoptical switch 220 to which it is connected. Before time locking can beachieved for a given source node 120A, the source node sends a firsttiming message, indicating a reading of its time counter, to the mastercontroller 240 of said specific optical switch 220, and obtains a replymessage indicating the corresponding time-counter reading at the mastercontroller 240 at the instant of receiving the first timing message. Thereply message is initiated by the master controller 240 which sends adownstream message to a specific edge node. The first timing message isincluded in an upstream control burst 1020 and the reply message isincluded in a downstream control burst 1400. Time-locking is notrequired for downstream communications because the edge node (the sinknodes) can buffer the data it receives. The downstream message commandsthe edge node (the source node) to send a time-counter reading of arespective output port of the source node. This reading is basically anindication of the start of the time-counter (the zero reading). An edgenode 120 provides a time counter in each of its output ports thatconnect to core nodes 140. Referring to FIG. 17, the master controller240 simultaneously sets a respective 1:2 optical switch 1720 and theoptical selector 1740 so that the optical signal received from thesource node is directed to an auxiliary port 1780 of the mastercontroller 240. The optical signal is first converted to the electricaldomain in 0/E unit 1750 and the electrical signal is parsed to obtainthe required timing data. Once the master controller 240 receives thetiming data, it replies with the corresponding master time-counterreading 1454 within a downstream control burst 1400. The time-counter ata corresponding output port of the source node is adjusted accordinglyand time-locking is then realized. With n=1, for example, and whenseveral source nodes are not time-locked to an optical switch 220 of acore node 140, the time-locking process just described is executedsequentially, one source node at a time.

FIG. 18 illustrates the required spacing of the upstream control bursts1020 received at the N ports of an optical switch 220 so that controloutput port B₀ receives one control burst at a time. The spacing ofupstream control bursts is required to ensure that there is nocontention in accessing the master controller through control outputport Bo (FIG. 3). Downstream control bursts 1400 are naturally spacedbecause they are switched from a control input port A₀ to output portsB₁ to BN in consecutive time intervals. Because upstream control bursts1020 carry control data of a predefined format, as indicated in FIG. 13,the upstream control bursts 1020, for different edge nodes, arepreferably of the same size. Similarly, the downstream control bursts1400 are preferably of the same size. This size uniformity facilitatesthe scheduling of the control bursts. It is emphasized that allschedules are produced, in the form of edge-node-specific burst-transferpermits 540 (FIG. 14), by the master controller 240 of the opticalswitch 220.

FIG. 19 illustrates the positioning of the control bursts within thescheduling cycles. To facilitate the scheduling, each control burst isplaced at corresponding cyclic times in consecutive scheduling cycles.Only one timing control-burst is normally required per time-countercycle (master cycle). Any of the scheduling cycles within the mastercycle may contain the timing control burst. A case where eachreconfiguration period 1220 equals a schedule period 1230 (FIG. 12) anda time-counter cycle (master cycle) period 1210 includes eightscheduling periods, and where the fifth schedule period within a mastercycle period contains a control burst 1020 that includes timing data, isindicated in FIG. 19. The control bursts in the remaining schedulingcycles are used for communicating control data between the edge-nodecontrollers (not illustrated) and the master controller 240 of theoptical switch 220. The reconfiguration period, in this example, isrepresented by V bits, the reconfiguration period equals the scheduleperiod, and the master cycle period is represented by W bits, withW−V=3. The duration of the schedule period is 2v clock periods and theduration of the master cycle is 2w clock periods.

FIG. 20 illustrates a core node 140 having four optical switches 220where some input ports in each optical switch 220 operate in achannel-switching mode and the remaining input ports operate in aburst-switching mode. An incoming fiber link 210 carries fourwavelengths that are demultiplexed and carried by internal fiber links2012/2014 to input ports 314 of the optical switches 220. Two of thefour wavelengths, referenced as 2014, are channel-switched tocorresponding output ports of respective optical switches 220 and theother two wavelengths, referenced as 2012, carry data bursts that areindividually switched to arbitrary output ports of respective opticalswitches 220, said arbitrary output ports excluding output ports thatreceive switched channels. When the channel-switched connections areevenly distributed among the optical switches, the burst-schedulingcomputational effort is evenly distributed among the master controllers240 of the optical switches 220.

Each burst-mode input port switches a succession of data bursts toseveral output ports. A channel-mode input port switches a succession ofdata units of any format to a single output port in a unicastconnection, or to several designated output ports in a multicastconnection. Basically, a channel is set up and retained for an extendedperiod of time. Channel scheduling in the arrangement of FIG. 20 ispreferably performed according to a packing process where the search foran optical switch 220 that can accommodate a required path starts fromthe same optical switch 220 in a core node 140. It is known that such apacking discipline increases network utilization by increasing theopportunity of matching a free input channel 2014 in an upstream link210 to a free output channel 2050 in a multi-channel link 230. Incontrast, burst-mode connections are preferably allocated equitablyamong the optical switches 220 of each core node 140. The reason is thatthe bottleneck in burst switching can be the burst-scheduling effort.While packing increases utilization, it also increases the schedulingeffort. The scheduling effort is, however, relatively insignificant inchannel switching in comparison with burst switching. The use of packingfor channel switching must be constrained, so that the number ofchannels connections per optical switch 220 is limited, to permit abalanced distribution of burst-switched connections among the opticalswitches 220 of a core node 140.

In order to enable burst switching, a time-locking process is applied asdescribed with reference to FIG. 15 or FIGS. 16 and 17. Channelswitching does not require time-locking if the switching pattern is notmodified frequently. Without time locking, channel switching requiresthat the corresponding source node refrain from sending data over aperiod of time sufficient to exchange messages with a respective opticalswitch 220 and implement the switching change at the optical switch 220.For example, if the switching pattern changes every hour, then allowingan idle period of about 80 milliseconds for reconfiguration result in arelatively-low waste. However, for adaptive channel switching, where theswitching pattern changes at a relatively high rate, every 100milliseconds for example, a guard time of 80 milliseconds would beexcessive and a guard time of only a few microseconds would bepermissible between successive switching changes. Schedulingswitching-pattern changes with a small guard time requires that the edgenodes be time-locked to the optical space switches 220 to which they areconnected.

Due to the varying propagation speeds for different wavelengths, thepropagation delay difference between wavelengths within the same WDMlink may be significant and strict time locking would be required foreach wavelength that is switched at a burst-mode port in an opticalswitch 220. Time-locking of a single wavelength channel is enabled byupstream control bursts 1020 (FIG. 13) and downstream control burst 1400(FIG. 14). This applies only to channels 2012 which operate in theburst-switching mode. A wavelength channel that is switched in itsentirety in an optical switch 220 of core node 140 cannot access themaster controller of the optical switch 220 and hence cannot acquireprecise time-locking. Relaxed time-locking can, however, be realized byassociation with other precisely time-locked channels. Thus, a guardtime at least equal to the difference in propagation delay between anytwo wavelengths may be applied between successive changes of thechannel-switching pattern at the optical switch 220. For example, link210 in FIG. 20 carries two channels, referenced as 2012, that lead toburst-mode input ports of optical switches 220A and two channels,referenced as 2014, to optical switches 220B. The output ports of asource node 120A from which link 210 emanates can be precisely timelocked to optical switches 220A. If it is estimated that the maximumdifferential propagation delay of the channels within link 210 is 2microseconds, for example, then an adaptive reconfiguration of any ofoptical switches 220B requires an idle period of only 2 microseconds.Thus, this associative time-locking can significantly reduce the idleperiod between successive reconfigurations.

Periodic Burst-Schedule Generation

Applicant's U.S. patent application Ser. No. 09/750,071, filed on Dec.29, 2000, and titled “Burst Switching in a High-Capacity Network”,describes a burst-switching network wherein source nodes requestconnections to be established through an optical switch and, at a mastercontroller of the optical switch, the requests are compared to othersuch requests so that a schedule may be established for access to theoptical switch. The schedule is then sent to the source nodes as well asto a slave controller of the optical switch. Data bursts are received atthe optical switch at a precisely determined instant of time thatensures that the optical switch has already reconfigured to providerequested paths for the individual bursts. The scheduling is pipelinedand performed in a manner that attempts to reduce mismatch intervals ofthe occupancy states of input and output ports of the optical switch.The method thus allows efficient utilization of the data networkresources while ensuring virtually no data loss.

In the aforementioned method, the computation of a burst-transferschedule takes place after the bursts are received at their source nodesand their descriptors are communicated to the master controller of theoptical switch. In a network of wide geographic coverage, the bursts mayhave to wait for a significant period of time at their respective sourcenodes. Thus, large buffers would be needed at the edge nodes and theresulting delay may be excessive. Furthermore, the speed of computingthe burst-transfer schedule must be sufficiently high to handle thecombined rate of receiving data bursts at the optical switch from allsource nodes connecting to the optical switch. This requirement reducesthe scalability of the network. The method of computing theburst-transfer schedule according to the present invention improves theabove method and significantly increases the scheduling capacity.

The core-node controller 240A of a core node 140 receives upstreamcontrol bursts 1020 (FIG. 10) from each source node 120A. The upstreamcontrol bursts contain bitrate-allocation requests (record 1310 of FIG.13) from the source nodes 120A. The bitrate-allocation requests receivedat the core-node controller 240A from the input ports of the opticalswitch 220 are allocated to individual space switches 220 in a way thatensures that none of the output ports (384) Bx, 1:::; x:::; N, isoverbooked, i.e., the combined bitrate allocation for each sink nodereached via an output port Bx of a space switch 220 does not exceed thecapacity of the downstream channel from the output port Bx to said sinknode. If the sum of bitrate allocations for a given output port (384) Bxexceeds its capacity, some requests must be reassigned to a differentspace switch 220. Bitrate-allocation requests may be modified or evenrejected.

In one embodiment, descriptors of bursts already waiting at edge nodesare sent to a core-node controller 240A of a core node 140 which assignsthe bursts to different space switches 220 of the core node 140 anddistributes the bursts to corresponding burst-descriptor memories 2210(FIGS. 23 and 24).

In another embodiment, the core-node controller 240A of a core node 140assigns burst streams, each having an allocated bitrate, to individualburst controllers 240 of the space switches 220 of the core node 140,and each master controller 240 generates burst descriptors based on saidbitrates. The bitrate-allocation requests are directed to correspondingburst-stream generators within the master controller 240. Eachburst-stream generator generates an unconstrained schedule of tentativeburst-transfer permits on the basis of the required bitrate and thecorresponding burst size as described earlier with reference to FIG. 8.An unconstrained schedule applies to a sequence of burst descriptorscorresponding to a single source node without coordination, for accessto output ports Bx, with burst sequences generated by the remainingsource nodes. The generated burst descriptors are then directed torespective burst-descriptor memories 2210. The function of the masterburst scheduler 1170 is to modify the burst timing so that output-portcontention at the optical switch 220 is avoided.

FIG. 21 illustrates a burst-generator-bank 2100 having multipleburst-stream generators 2120 for generating burst descriptors ofbitrate-regulated burst streams associated with a plurality of sourcenodes. Each source node 120A connecting to a core node 140 sends thecore-node controller 240A of said core node a vector of burst-streamdescriptors, each burst stream being associated with a destination sinknode 120B. A burst-stream descriptor includes a destination sink node, abitrate allocation, and a class of service in fields 1322, 1324, and1326, respectively (FIG. 13). Burst-stream generator 2120 determines aburst-descriptor for each burst in a burst-stream based on the methoddescribed above with reference to FIG. 8. In addition, burst-streamgenerator 2120 generates a tentative time table for switching burstscorresponding to said burst descriptors. The tentative time table isbased on the bitrate allocation for the burst stream. The method ofgenerating the tentative time table is described below with reference toFIGS. 35 and 36. The tentative time tables received from the pluralityof burst-stream generators 2120 are multiplexed by multiplexer 2130 andplaced in a burst-descriptor memory 2210 for use by a master burstscheduler 1170. The burst-descriptor memory 2210 may be a single memoryor a bank of memories, as will be described with reference to FIGS. 22,23, and 24.

The outputs of N burst-stream generators 2120, each associated with aninput port 314, are multiplexed and presented to the burst-descriptormemory 2210 of FIG. 22. The burst-stream generators for different ports314 (hence different source nodes 120A) function independently and theyneed not be time coordinated.

FIG. 22 is a block diagram of an apparatus for burst-schedulegeneration. In general, the apparatus may be used either to schedulebursts based on burst-descriptors received from source nodes 120A or togenerate burst-transfer permits based on burst descriptors generated atthe master controller 240 of an optical switch 220. In the latter case,rather than forming the bursts at the source nodes 120A then schedulingtheir transfer to an optical switch 220, the process is reversed whereburst-transfer permits are generated at a controller of an opticalswitch 220 and distributed to a plurality of edge nodes 120. Thegeneration of burst-transfer permits would be based on burst streamdescriptors generated by the edge nodes 120, such descriptors mayinclude parameters such as bitrate allocations and class of service butdo not include individual burst descriptors.

Burst-descriptors are generated for each burst stream where each burststream is allocated a bitrate. The generated burst descriptors arestored in a burst-descriptor memory 2210. An input-state memory 2220holds an input-state array having N records, each record correspondingto an input port 314 of the optical switch 220 indicates the time atwhich each input port will become free. Similarly, an output-statememory 2240 holds an output-state array having N records, each recordcorresponds to an output port 384 of the optical switch 220 andindicates the time at which each output port 384 will be free. Undercontrol of the processing circuit 2250, a scheduling kernel 2280determines the switching time for each burst represented by aburst-descriptor waiting in the burst-descriptor memory 2210. Each burstdescriptor specifies an input port 314 and an output port 384, and theburst switching time is determined as the larger of the time at whichthe input port becomes free, as read from the input-state memory 2220,and the time at which the output port becomes free, as read from theoutput-state memory 2240.

In order to maximize the utilization of the optical switch 220, andhence the utilization of upstream optical channel 214 and downstreamoptical channel 224 (FIG. 2), the absolute value of the differencebetween the free time of the input port 314 and the free time of thecorresponding output port 384 should be minimized. The scheduling kernel2280 can reduce the absolute value of this difference by examiningseveral burst descriptors belonging to the same input port and selectinga burst descriptor according to a prescribed criterion, such as theminimum absolute difference.

In order to implement multiple-burst-descriptor processing withoutslowing down the scheduling process, the burst-descriptor memory 2210 isimplemented as several independent memories 2310, as illustrated in FIG.23, each of which storing burst descriptors related to a subset of inputports 314 of the optical switch 220. FIG. 23 illustrates the use of fiveburst-descriptor memories 2310, each holding burst-descriptorsassociated with a subset of input ports 314, corresponding to a subsetof source nodes 120A. Each of the burst-descriptor memories 2310 of FIG.23 has an associated register (not illustrated) that can hold severalburst descriptors, four for example. The five registers are visitedcyclically. Thus, the use of separate memories 2310 allows thescheduling kernel 2280 to select several burst descriptors from eachmemory and place them in a register so that they can be read in parallelwhen a register is sampled by processing circuit 2250. The output-statememory 2240 may also be implemented in several memories 2340, asindicated in FIG. 23, all having identical data. This allowssimultaneous computation of the absolute free-time differences asdescribed above. When a burst-descriptor is selected, and its switchingtime determined, a burst-transfer permit is generated and placed in apermits buffer 2282. The burst descriptor is dequeued from therespective burst-descriptor memory 2310 and the switching time isentered in a corresponding record in input-state memory 2220 and in acorresponding record in each of the output-state memories 2340. Theoutput-state memories 2340 generally have different read addresses butthe same write address.

An input-state memory 2220 holds an input-state array 2520 (FIG. 25)having N records, N being the number of input ports 314, and each recordcontains an indication of the instant of time at which an input port 314of a optical switch 220 would be available to transmit a burst to anoutput port 384 of the optical switch 220. An output-state memory 2340holds an output-state array 2540 (FIG. 25) having N records, each recordindicating the instant of time at which the output port of the opticalswitch 220 would be available to start receiving a burst from one of theinput ports 314. In order to reduce the time required to schedule aburst, several output-state memories 2340 may be used for parallelreading as described above with reference to FIG. 23. The paralleloutput-state memories 2340 are identical, each containing the sametiming data.

The burst-transfer permits placed in the permits buffer 2282 arecommunicated to respective edge nodes 120 via control port A0, theoptical switch 220, and output ports B₁ to BN (FIG. 3). The edge nodes120 receive burst-transfer permits from the master controllers 240 ofseveral optical switches 220 belonging to several core nodes 140, formdata bursts according to the permits they receive, and transmit theformulated data bursts to selected optical switches 220 of a selectedcore node 140 according to the timing indicated in the permits. Thescheduling kernel 2280 generates a connection timetable corresponding tothe permits and, after a calculated period of time, submits thetimetable to the slave controller 250 (FIG. 2) which establishes aconnection from an input port 314 to an output port 384 of a spaceswitch 220 for each data burst precisely at the time of arrival of thedata burst. The applied delay (the calculated period of time) at theslave controller 250 must exceed the round-trip delay between the corenode and the edge node. The bursts generated by a burst-stream generator2120 are grouped into burst sets, where each burst set occupies aschedule period 1230 (FIG. 12). The burst scheduling Kernel 280 performsthe main scheduling task where the bursts of the generated burst-setsare scheduled for switching from their input ports 314 to the designatedoutput ports 384. Contention avoidance is realized with the help of theinput-state array 2520 and output-state array 2540 (FIG. 25). Thefunction of the burst-scheduling Kernel 2280 will be described withreference to FIG. 24.

FIG. 24 illustrates a slightly different implementation of thescheduling apparatus of FIG. 23. Each input port 314 operating in burstmode directs upstream control bursts 1020 to master controller 240through output port Bo. After optical to electrical (0/E) conversion,the control data are received in an electrical form at interface 2408.The burst-scheduling device includes a bank of independentburst-descriptor generators 2412. Each burst-descriptor generator 2412includes a burst-generator bank 2100, each of which is associated with aburst-descriptors memory 2310. A register 2424 that can hold apredefined number, Q, of burst descriptors is associated with eachmemory 2310. Each of the burst-generator banks 2100 is associated with anumber of input ports of the optical switch 220. A burst-generator bank2100 receives bitrate allocations related to a plurality of source nodes120A and generates a sequence of burst-descriptors as described earlierwith reference to FIG. 21. The bitrate allocations are distributed bythe core-node controller 240A to all other master controllers 240 of thesame core node 140. Recall that a core-node controller 240A is one ofthe master controllers 240 selected to perform the added function ofdistributing the burst scheduling task among the master controllers 240of a core node 140. As described above, the burst-descriptors may bedetermined by the source nodes 120A and placed directly in respectiveburst-descriptor memories 2210/2310.

The Q burst descriptors are read sequentially from a burst-descriptormemory 2310 and placed in a register 2424 that can hold the Qdescriptors for further parallel processing. This process takes placeconcurrently in all burst-descriptor generators 2412. In an opticalswitch 220 that has a small number, N, of input ports, 32 for example,only one burst-descriptor generator 2412 would be required. With a largenumber, N, of input ports, 256 for example, the use of parallelburst-descriptor generators, each handling a subset of the N input portsallows concurrent placement of burst descriptors in registers 2424.Burst scheduling is performed by circuit 2250.

A comparator 2480 receives the time at which an input port 314 is free,as read from the input-state memory 2220, and the times at whichcandidate output ports 384 are free, as read from the paralleloutput-state memories 2340. The comparator 2480 then selects one of theoutput ports 384 of the optical switch 220 and returns an identifier ofthe selected port, as well as the transfer time of the correspondingburst, to processing circuit 2250 and adder 2438, as indicated by thesymbols ‘A’ and ‘B’ in FIG. 24, corresponding to reference numerals 2433and 2435, respectively. It is possible that two or more of the candidatebursts be destined for the same output port.

The upstream control bursts 1020 include, amongst other information,requests for modifying bitrate allocations from each edge node 120connecting to a port operating in the burst mode. As mentioned above,there is a burst-stream generator 2120 associated with each input port314 that operates in the burst mode. The bitrate-allocations received atinterface 2408 are directed to respective burst-stream generators 2120within burst-generator bank 2100. Each burst-stream generator 2120independently generates descriptors of bursts destined to output portsof the optical switch and forms queues of the burst descriptors in anassociated memory 2310. Q>1 of burst descriptors are dequeued from thehead of each queue in a memory 2310 and placed in a register bank 2424,and the Q burst descriptors can be read in parallel from each registerbank 2424. A preferred value of Q is 4. A large value of Q improvesutilization at the expense of circuit complexity. A cyclic selector 2414visits each register-bank 2424 during a specified interval of time anddirects the Q burst descriptors to processing circuit 2250 whichdetermines the read address in an output-state memory 2340 for each ofthe Q burst-descriptors. The instant of time at which each of Q outputports identified in the Q burst descriptors is free to receive data isread from a respective memory 2340 and compared in the comparatorcircuit 2480.

Comparator circuit 2480 selects the output port for which the absolutevalue of the difference between the input-port availability time T1 andthe output-port availability time T2 is the lowest. This selectionincreases the scheduler efficiency. For example, if T1=12000, and fouroutput ports corresponding to four burst descriptors read from aregister 2424 have availability times of 11200, 12700, 12284, and 10020,then the deviations from the input availability times are −800, 700,284, and −1980. The minimum absolute deviation is 284 (not −1980) andthe corresponding output port is selected. Thus the burst would bescheduled for transfer at time 12284 (the larger of 12284 and 12000).If, in the above example, the time T1=11500, then the deviations are−300, 1200, 784, and −1480, and the minimum absolute deviation is −300.Thus, the burst would be scheduled for transfer at time 11500 (thelarger of 11500 and 11200).

When one of the candidate burst descriptors is selected, the time atwhich both the input port and output port specified in the selectedburst descriptor will be available next is computed and is used tooverwrite corresponding current values in the input-state memory 2220and the parallel output-state memories 2340. This calculation is done asfollows. The comparator circuit 2480 determines the candidate outputport corresponding to the minimum absolute deviation and the bursttransfer time as described above. Comparator circuit 2480 then reportsthe elected output to processing circuit 2250 and the correspondingburst transfer time to adder 2438. Processing circuit 2250 has the burstduration for each of the Q candidate burst descriptors and it inputs theduration of the selected burst to adder 2438. The output of adder 2438is the nearest availability time of both the input port and output portfor the selected burst. This is used to update corresponding entries ininput-state memory 2220 and output-state memory 2340 of FIG. 24. Thedescriptor of the selected burst is then removed from the correspondingburst-descriptor memory 2310.

FIG. 25 illustrates the input-state and the output-state arrays 2520 and2540, respectively, at some intermediate instant in the schedule period.As mentioned earlier, input-state array 2520 is stored in input-statememory 2220 (FIGS. 23, and 24) and output-state array 2540 is stored ineach output-state memory 2340 (FIGS. 23, and 24). When a burst isscheduled, its termination time is shown at respective entries in theinput-state array 2520 and the output-state array 2540, as illustratedin FIG. 25. Upon burst termination, the corresponding entries in theinput-state array 2520 and output-state array 2540 are available forother bursts, generally with different connections. Thus, it is possiblethat an entry in the input-state array 2520 does not appearsimultaneously in the output-state array 2540.

In overview, methods of scheduling the transfer of data bursts amongedge nodes, having buffering facilities, through bufferless core nodesare devised to reduce processing effort and increase overall networkefficiency. At each core node, each of several burst-schedulersdetermines, using parallel comparisons, the proximity of available timesof selected input ports and selected output ports indicated in a set ofcandidate burst descriptors and schedules a data burst according to saidproximity. In a preferred mode of operation of a burst-switched network,rather than sending requests to schedule data bursts after they arereceived at a respective source node, each source node determines thebitrate requirements for paths to each sink node and sendsbitrate-allocation requests to a selected core-node controller whichcomputes burst-transfer permits and sends the permits to correspondingedge nodes. This reduces the scheduling delay while avoiding data lossat the core node.

Routing

As described earlier, a burst stream is defined by its source node, sinknode, and a path from the source node to the sink node. In the networkof FIG. 1, there are several paths from each source node to each sinknode through different core nodes 140, and there are several pathswithin each core node 140, each path being defined by an input port 314and an output port 384 of a space switch 220. The capacity of a singlepath equals a channel capacity, typically 10 Gigabits/second (Gb/s).

The data to be transferred from a source node to a sink node may have tobe allocated to several paths if the required capacity exceeds thecapacity of a single path. Even when the required capacity is less thanthe capacity of a single paths, the data from a source node to a sinknode may still be transported through several paths due to contention.

Reducing the number of paths used by each node pair (source node to sinknode) results in increasing the mean burst size and, hence, reducing themean burst rate. The transport capacity of a burst switch, i.e., thetotal bitrate received at input and released at output, is curtailed bythe processing capacity of its burst scheduler. A given burst schedulercan schedule a given number of bursts per second that is virtuallyindependent of the burst sizes. Thus, increasing the mean burst size, asdescribed above, increases the transport capacity of a burst scheduler.To illustrate, consider a core node 140 having 8 space switches 220,each having N input ports 314, connecting to N upstream channels, and Noutput ports 384 connecting to N downstream channels, with a payloadcapacity, excluding control overhead, of R=9.8 Gb/s for each input oroutput port. The total transport capacity of the node is then 8×N×R. Inthis example, the core node 140 transfers node-pair data of equalbitrate allocations of 200 Mb/s each. With a burst-formation period Doof 1 millisecond (as described with reference to FIG. 8), the burstlength is 200 kilobits and the burst rate per upstream channel is 49kilo-bursts per second, which is the rate R=9.8 Gb/s divided by theburst length of 200 kilobits. The total burst rate per space switch isthen 49×N kilo-bursts per second. If each bitrate allocation of 200 Mb/sis transferred evenly over the 8 space switches of the core node, themean burst size would drop to 25 kilobits and the burst rate per spaceswitch increases to 392×N kilo-bursts per second. If the burst schedulerin the master controller 240 of each space switch 220 can only schedule10,000,000 bursts per second, then the number N of input ports (ofoutput ports) would be about 200 in the first case and 25 in the secondcase. It is preferable, therefore, that the core-node controller 240Aattempt to assign the data of each node pair (source node to sink node)to the smallest number of space switches 220 within the core node 140.

As described above, a core node 140 has a plurality of parallel spaceswitches (optical switches) 220, each having N input ports and N outputports, and connects at most N upstream multi-channel (multi-wavelength)upstream links to at most N multi-channel (multi-wavelength) downstreamlinks. In order to confine connections from each upstream link to eachdownstream link to a small number of space switches, the core-nodecontroller 240A sorts the bitrate requirements associated with eachupstream link in a descending order according to bitrate value thenimplements a cyclic allocation of said requirements to correspondingpaths of the space switches in a manner that attempts to equalize theburst rate per space switch. In a case where the remaining unassignedcapacity in a path is insufficient to accommodate a bitrate requirement,a part of the requirement may be assigned and the remainder ofunassigned bitrate is retained for a subsequent path. The process may beapplied iteratively, with the bitrate allocations per iteration used asa progress indicator, until all bitrate-allocation requirements are metor further progress cannot be made.

The scalability of the core node 140 is determined, in part, by thespeed of the burst scheduling function. The scheduling method describedabove requires that all registers 2424 be visited during a period notexceeding the mean burst duration. In each visit to a burst-descriptormemory 2210/2310, a single burst is scheduled. Thus, in a optical switchof 256×256 capacity, with all ports operating in a burst-switching mode,and with a mean burst duration of 8 microseconds for example, a burstmust be scheduled within 8/256 microseconds, i.e., about 30 nanoseconds.Note that a single scheduler in each master controller 240 of an opticalswitch 220 handles bursts from all the 256 input ports of the opticalswitch 220. In order to allow more computation time per burst, the timeallocated for computing a burst-transfer schedule can be extended to bean integer multiple m of the designated schedule period. In the aboveexample, if the designated period is 16 milliseconds, and the value of mis chosen to be 8, then about 240 nanoseconds would be available toschedule a burst. Thus, the time of computing a schedule exceeds thereal-time period covered by the schedule by a factor of m. The schedulecomputation period is then 128 milliseconds with m=8, and bitrateupdates would be processed every 128 milliseconds, i.e., thereconfiguration period 1220 is 128 milliseconds.

In accordance with the present invention, two methods can be used toincrease the scheduler capacity. In the first method, a schedule iscomputed, for a succession of bursts generated over a schedule period T,every m schedule periods, where the value of the integer m exceeds theratio of the time required to compute said schedule and said designatedschedule period T. As described earlier, the succession of bursts may begenerated according to bitrate allocations for each burst stream to beswitched from a burst-mode input port 314 to an output port 384. Thebitrate allocations are then refreshed periodically every m×T interval.FIG. 26 illustrates the generation of a schedule for switching databursts, over a designated schedule period T, from a burst-mode inputport to an output ports. The schedule for switching data bursts is usedrepetitively during m consecutive period, m being an integer greaterthan zero and each of said consecutive periods is equal to saiddesignated schedule period T. FIG. 26 illustrates the correspondence ofa schedule periods 1230 and corresponding computation period 2630.Referring to FIGS. 12 and 26, the reconfiguration period 1220 is atleast equal to the schedule-computation period 2630.

In the second method, illustrated in FIG. 27, the computation period foreach of said successive time intervals is an integer multiple m of theinterval T and m successive schedules are computed concurrently using atleast m scheduling devices 1170 (FIG. 11 and FIGS. 22 to 24). The valueof m exceeds the time required to compute said schedule for each timeinterval T divided by the designated time interval T. The schedule maybe computed for burst descriptors generated according to bitrateallocations for each pair of burst-mode input port 314 and output port384, and the bitrate allocations are refreshed every interval T. Asillustrated in FIG. 27, the time separation of successive scheduleperiods equals T. FIG. 27 illustrates schedule periods 1230-A, 1230-B,etc., and corresponding computation periods 2710-A, 2710-B, etc.Referring to FIGS. 12 and 27, the reconfiguration period 1220 is atleast equal to any of the computation periods 2710-A, 2700-B, etc.,which correspond to schedule period 1230-A, 1230-B, etc. If m=1, the twomethods become equivalent and input-state arrays 2520 and output-statearrays 2540 should not then be zero-initialized since scheduling takesplace continually in the time domain Form>1, in both the first methodand second method above, each input-state array 2520 and eachoutput-state array 2540 must be zero-initialized because of thediscontinuity of the scheduling process. This discontinuity requiresthat the termination time of each burst be confined within the scheduleperiod.

To enable repetitive use of the same schedule over successive designatedschedule periods, according to one embodiment, front-end burstscheduling is used where no bursts are scheduled for switching duringthe interval between T−d and T, where T is the length of designatedschedule period 1230 (FIG. 12 and FIG. 26 and FIG. 27), and d is themaximum packet duration (32 microseconds for example). The value of T is16 milliseconds in the above example. A burst that is switched at time(T−d) or earlier, would then be completely transferred from an inputport 314 to an output port 384 of the optical switch 220 before the endof the designated schedule period. The possible waste due to a partiallyused interval between (T−d) and T would typically be insignificant. Inthe above example, the relative waste is less that 32/16000, i.e., lessthan 0.002. According to another embodiment, trailing-end burstscheduling is used where the comparator 2480 computes the terminationtime of a burst and ensures that it is within the designated scheduleperiod. Thus, a burst may be scheduled after the instant (T−d) if itsduration is less than d. FIG. 28 illustrates front-end burst scheduling(FIG. 28 a) and trailing-end burst scheduling (FIG. 28 b) over ascheduling period T, as described above. FIG. 28 a illustrates theinstant of time 2820, relative to the start of a schedule period 1230(FIG. 12) beyond which no bursts are scheduled. FIG. 28 b illustratestrailing-edge scheduling where a burst can be scheduled anywhere withinthe schedule period 1230 as long as its termination time does not exceedthe end 2830 of the schedule period 1230.

It is noted that, if the number of ports 314/384 per space switch 220 issufficiently small, and/or if the capacity per port 314/384 is low, thecore-node controller 240A of a core node 140 may divide the task ofscheduling the space switches 220 of the core node among a subset of themaster controllers. As described earlier, the master controllers 240 ina core node 140 are interconnected and, hence, can exchange computedschedules.

Adaptive Burst Formation

In Applicant's U.S. patent application Ser. No. 09/735,471 filed on Dec.14, 2000 and titled “Compact Segmentation of variable-size-packetsstreams,” a method is described for segmenting a data stream comprisingvariable-size packets, a data stream being defined by its source node,sink node, assigned network route, and other attributes. The segmentsare of equal size and the method concatenates the packets in successivesegments in a manner that attempts to minimize segmentation wastewithout undue buffering delay. The method facilitates the constructionof efficient networks while respecting service-quality specifications.Herein, the method is adapted to enable efficient formation ofvariable-size data bursts at an edge node 120.

FIG. 29 illustrates a source node 120A and a sink node 120B of an edgenode 120. Traffic sources (not illustrated) send data packets ofarbitrary sizes, within the restrictions of respective protocols, suchas IP4 or IP6, to the ingress ports 2910 of the source node. The datapackets may be switched through a switching fabric 2920 of the sourcenode to output ports 2930 interfacing with the network core nodes 140.An incoming data packet may be transferred to an output port 2930 acrossthe switching fabric 2920 of the source node in the same format in whichthe packet is received at an ingress port. Alternatively, the datapacket may be segmented into data blocks of equal size to simplify thedesign of the switching fabric 2920. This process may result inpartially-filled data segments. A partially-filled data segment is alsocalled an incomplete segment. The data packets received at an outputport 2930 are sorted into output queues according to destination sinknode 120B. The output queues (not illustrated), each corresponding to adestination sink node 120B, preferably share a common memory within port2930. Regardless of the method of internal packet switching within thesource node 120A, the data packets in an output queue are aggregatedinto data bursts, as will be detailed below with reference to FIG. 31and FIG. 32. Typically, a data burst would include a large number ofindividual data packets.

Each output queue, an output queue of a source node 120A beingassociated with a single destination, a destination being a sink node120B in any edge node 120, is allocated a bitrate at which the queue isserved. The allocated bitrate for each queue is determined by anadmission controller. The bitrate allocations for the output queues of agiven output port 2930 may vary significantly. For example, one queuemay be allocated a bitrate of 10 Mb/s (Megabits per second) whileanother queue in the same output port is allocated 5 Gb/s (Gigabits persecond). Burst formation takes place at each output port 2930 of thesource node 120A. The selection of a burst size has a significant effecton the burst-transfer processing effort and the efficiency of linksconnecting the edge nodes to the core nodes. At a given bitrateallocation, large bursts result in a reduced burst-generation rate,hence less relative header overhead and higher transport efficiency. Alow burst rate reduces the processing effort at the controllers of theoutput ports 2930 of the source node 120A and, most importantly, at thecore-node master controllers 240 as described with reference to FIG. 24.

A sink node 120B receives data bursts at input ports 2970 and switchesthem in segmented format through switching fabric 2940 to egress ports2980.

Data bursts are switched to the input ports 2970 of sink nodes 120Bthrough the optical core nodes 140. The bursts received at the inputports 2970 of each sink node may be of substantially different sizes. Ateach input port 2970 of a sink node, each received burst must be parsedinto its constituent individual packets and the individual packets areswitched to egress ports 2980, through the internal fabric 2940 of thesink node, to be delivered to their intended data sinks.

As illustrated in FIG. 30, each source node 120A may be paired with asink node 120B, with which it shares a switching fabric 3020 and acontroller (not illustrated), to form an edge node 120. The integrationof a source node with a sink node facilitates intra-edge-node switchingand closed-loop control and management communications with the networkcore. Closed-loop paths are needed to exchange certain control databetween an edge node 120 and a core node 140.

As described earlier, each output port 2930 has a time counter to enabletime locking the output port to a core node. An output port 2930 mayhave a bank of time counters, one associated with each core node 140.

FIG. 31 illustrates a device 3100 for packets aggregation into bursts.The device includes an enqueueing controller 3110, a dequeueingcontroller 3180, a burst-transfer scheduler 3150, a control memory 3120,an auxiliary data memory 3130, and a principal data memory 3140. Onedevice 3100 is provided at each output port of a source node 120A.

To facilitate switching within the source-node fabric 2920 (or commonfabric 3020), packets received at the ingress ports 2910 are segmentedin a conventional manner and the segments are switched through theswitching fabric 2920 (or 3020) of the electronic source node 120A. Thedata received at each ingress port 2910 is formatted into equal-sizedata segments of a predetermined size G; G=128 bytes for example. A datasegment may be complete or null-padded. However, the null padding isremoved in the process of burst formation at the output ports 2930 aswill be described below.

FIG. 32 illustrates the organization of the control memory 3120, theauxiliary data memory 3130, and the principal data memory 3140 of FIG.31. Array 3230, stored in auxiliary data memory 3130, has N records, Nbeing the number of sink nodes, each record storing an incompletesegment destined to sink node j, 0::=:; j<N. Array 3240, stored in theprincipal data memory 3140, has a sufficient number of records to storeall data ready for transferring to the plurality of sink nodes. Eachrecord has two fields. A first field, P(1, j) contains an identifier ofthe record in which a new data segment destined to sink node j, Thesecond field P(2, j) contains a complete data segment. 0:s;j<N, is to bewritten.

There are N records in array 3220 stored in control memory 3120, eachrecord having two fields 3212 and 3214. The first field 3212, contains avalue C (1, k) indicating the number A(k) of data bytes in an incompletesegment waiting in record k of the auxiliary array 3230, the recordcorresponding to destination sink node k. The second field 3214,contains a pointer C(2, k) to a record in the principal array 3240 inwhich the first segment of a burst to be transferred to destination sinknode k is stored. It is noted that there can be only one incompletesegment waiting in memory 3130 for a given destination sink node 120B.Therefore, the number of records in array 3230 need not exceed N, Nbeing the number of destination sink nodes as described earlier.

A complete data segment is directed to the principal data memory 3140,to be placed in array 3240, if the corresponding record in auxiliaryarray 3230 is vacant. Otherwise, the complete data segment is mergedwith the incomplete segment stored in a corresponding record inauxiliary array 3230. This process may result in adding a completesegment, if any, in the principal memory and storing the remainder, ifany, in a corresponding entry in the auxiliary memory. An incomplete newsegment is always merged with the content of the auxiliary memory, andthe merged data is divided into a complete segment, if the size ofmerged data exceeds a segment size, to be directed to the principal datamemory 3140, and an incomplete segment, of u bytes, to be stored in theauxiliary memory if u>0. To simplify the design, the burst sizes (burstlengths) are restricted to be integer multiples of a basic unit, whichmay be selected to be a data segment. A burst may occupy several recordsin the principal data memory 3140.

FIG. 33 illustrates the process of storing a new packet received at aningress port 2910 of a source node 120A. The packet is first associatedwith one of predefined burst streams. A burst stream may be definedaccording to destination and a selected path through a core node. Forthe purpose of burst formation, all burst data from a source node to asink node are treated as a single burst stream. When a packet isreceived, it is segmented into segments in a conventional manner at theingress port 2910. Data segments received at an output port 2930 of asource node includes both complete and incomplete segments. Anincomplete segment has less data than the defined segment size and isnull padded. The segments are processed individually. The streamidentifier, k, and the payload length, L, of the segment (which excludesany null padding) are determined. The two fields C(1, k) and C(2, k),corresponding to entries in the auxiliary and principal data memories ofFIG. 31, are read simultaneously from the control memory 3120. A valueC(1, k) of 0 indicates that there is no fractional segment belonging tostream k and waiting in the auxiliary memory 3130. Thus, in step 3310,if C(1, k) is determined to be zero, control is transferred to step3320, otherwise, control is transferred to step 3330. In step 3320, ifthe length L is determined to be equal to the predefined segment lengthG (G=128 bytes for example), the segment is stored directly in principaldata array 3240 which is organized as interleaved link lists (step3324). In effect, the principal data array 3240 constitutes a number ofinterleaved queues. (Interleaved linked lists are well known in the artand are not described here. Basically, they allow dynamic sharing of amemory by X>1 data streams using X insertion pointers and X removalpointers.) Otherwise, if in step 3320 the value of L is determined to beless than a full-segment length G, the fractional segment is placed inposition k in auxiliary array 3230 (step 3322). Note that, at thispoint, the position kin auxiliary array 3230 is vacant because step 3320is reached only when C(1, k) is determined to be zero. The fractionalsegment will remain in auxiliary array 3230 until it is eitherconcatenated with a forthcoming segment of the same stream k, or isdequeued by the burst-transfer scheduler 3150, whichever occurs first.If, on the other hand, the entry C(1,k) is found in step 3310 to begreater than zero, the enqueueing controller 3110 concludes that thereis a waiting fractional segment belonging to stream k. The arrivingsegment, whether complete or fractional, is then concatenated with theexisting fractional segment (step 3330). In step 3332, if the resultequals or exceeds a full segment, a full segment is appended directly toa corresponding queue in principal array 3240 which can hold severalinterleaved queues, each corresponding to a sink node. If the remainderof concatenation is greater than zero, the remainder is placed back inposition kin auxiliary array 3230 (step 3335). If the remainder is zero,corresponding entry C(1,k) in array 3220 is set equal to zero (step3333) to indicate to a future arriving segment that there is no waitingfractional segment belonging to stream k. It is noted that theinterleaved linked lists are addressed independently but they share thesame memory device 3140.

FIG. 34 is a flow chart showing the dequeueing of segments to formbursts under rate control. Note that the enqueueing process of FIG. 33is triggered by a packet arrival at an output port 2930 while thedequeueing process of FIG. 34 is triggered by a burst-transfer scheduler3150 which indicates the service eligibility for each burst stream. Whenthe burst-transfer scheduler 3150 indicates that a stream k is eligiblefor burst transfer, the corresponding burst length for stream k isdetermined. The burst length, Y, is determined as an integer multiple ofa segment length. The selection of the burst length was described withreference to FIGS. 8 and 9. A counter is set equal to Y and decreased insteps of unity as segments are dequeued from principal data array 3220and/or auxiliary data array 3230. When the counter reaches zero, thedequeueing of the burst is complete.

To dequeue a segment, two single-bit numbers S1 and S2 are determined(3412) by a simple logic circuit (not illustrated). S1 equals 0, ifC(1,k)=0, and equals 1 otherwise. S2 equals 0, if C(2, k)=0, and equals1 otherwise. Selector 3414 selects one of three branches based on thevalue of {S1, S2} as illustrated in FIG. 34. If the 2-bit number {S1,S2} is “00”, the dequeueing controller 3180 (FIG. 31) concludes thatthere are no segments belonging to stream k waiting in either auxiliaryarray 3230 or principal data array 3240.1 t then returns a code “0” tothe burst-transfer scheduler 3150 (FIG. 31). The burst-transferscheduler 3150 may use the return code to terminate burst dequeueingfrom memories 3130 and 3140 when the number of dequeued segments, whichmay include a fractional segment, is less than the number of segmentsspecified by a master controller 240. The burst-transfer scheduler 3150may also use the return code to perform other functions specific to itsinternal operation. If the number {S1, S2} is “10”, the dequeueingcontroller 3180 concludes that there is a fractional segment inauxiliary data array 3230 but no segments in principal data array 3240belonging to stream k. In step 3422 the entry C(1,k) is reset to zeroand the fractional packet waiting in auxiliary data memory 3130 at entryk is transferred to the network through selector 3436 and outgoing link3440.

If the number {S1, S2} is either “01” or “11”, the dequeueing controller3180 concludes that there is a complete segment belonging to stream kwaiting in principal data memory 3140 (principal data array 3240).Control is then transferred to step 3432. The existence, or otherwise,of a waiting fractional segment belonging to stream k in auxiliary datamemory 3130 is irrelevant. The complete segment is then transferred fromprincipal data memory 3140, as indicated in step 3432, through selector3436 and outgoing link 3440. Normal book keeping functions, such as thereturn of the address H=C(2,k) to the pool of free addresses in memory3140, are performed in step 3434.

The embodiments of the invention described above are intended to beexemplary only. Other modifications will be apparent to those skilled inthe art and, therefore, the invention is defined in the claims.

What is claimed is:
 1. A communication network comprising: a pluralityof edge nodes; and a plurality of core switching nodes, each coreswitching node comprising at least one optical switch having no trafficbuffers, a selected core switching node of the plurality of coreswitching nodes being configured to: receive, from a source edge node ofthe plurality of edge nodes, a bit rate allocation request specifying adestination edge node of the plurality of edge nodes; generate, inresponse to the bit allocation request, at least one burst transferpermit, each burst transfer permit specifying a respective permissibleburst size, an inter-burst interval and the destination edge node; andsend the at least one burst transfer permit to the source edge node. 2.The network of claim 1, wherein the source edge node is configured to:receive from the selected core switching node, the at least one bursttransfer permit; assemble at least one data burst having a burst sizenot exceeding the respective permissible burst size specified in theburst transfer permit; and send the at least one data burst to theselected core switching node.
 3. The network of claim 2, wherein therespective permissible burst size has an upper bound constrained by amaximum jitter tolerance of data constituting the assembled data bursts.4. The network of claim 2, wherein: each burst transfer permit specifiesan arrival time for a data burst at the selected core switching node;and the source edge node is configured to send the at least one databurst from the source edge node to the selected core switching node bytiming the sending of a data burst based on the arrival time specifiedin a corresponding burst transfer permit.
 5. The network of claim 1,wherein the selected core switching node is configured to: receive, fromthe edge node, at least one data burst based at least in part on the atleast one burst transfer permit; and optically switch the at least onedata burst to the destination edge node.
 6. The network of claim 1,wherein: the data of data bursts switched by the selected core switchingnode comprises data of at least one service; and the respectivepermissible burst size has an upper bound constrained by a delaytolerance of the at least one service switched by the selected coreswitching node.
 7. The network of claim 1, wherein the respectivepermissible burst size has a lower bound much larger than a timerequired for reconfiguring the at least one optical switch and a maximumarrival time error for data bursts arriving at the at least one opticalswitch.
 8. The network of claim 1, wherein each burst transfer permitspecifies an arrival time for a data burst at the selected coreswitching node.
 9. The network of claim 1, wherein the plurality of coreswitching nodes are configured to: receive, from respective source edgenodes of the plurality of edge nodes, a plurality of respective bit rateallocation requests, each respective bit rate allocation requestspecifying a respective destination edge node; generate, in response tothe respective bit allocation requests, respective burst transferpermits, each respective burst transfer permit specifying a respectivepermissible burst size, a respective inter-burst interval and arespective destination edge node, each respective permissible burst sizehaving an upper bound constrained by a delay tolerance of servicesswitched by the core switching nodes; and send the respective bursttransfer permits to the respective source edge nodes.
 10. The network ofclaim 9, wherein the edge nodes of the plurality of edge nodes areconfigured to: receive, from respective core switching nodes, respectiveburst transfer permits; and assemble, in response to the respectiveburst transfer permits, respective data bursts having a burst sizes notexceeding the respective permissible burst sizes specified in therespective burst transfer permits; and send the respective data burstsfrom the respective source edge nodes to the respective core switchingnodes.
 11. The network of claim 10, wherein: each respective bursttransfer permit specifies a respective arrival time for a respectivedata burst at the respective core switching node; and the respectiveedge nodes are configured to send the respective data bursts to therespective core switching nodes by timing the sending of the respectivedata bursts based on respective arrival times specified in respectivecorresponding burst transfer permits.
 12. The network of claim 9,wherein the respective core switching nodes are configured to: receive,from the respective edge nodes, respective data bursts based at least inpart on the respective burst transfer permits; and switch the respectivedata bursts to respective destination edge nodes.
 13. The network ofclaim 9, wherein the respective permissible burst sizes have a lowerbound much larger than a time required for reconfiguring the opticalswitches and a maximum arrival time error for data bursts arriving atthe optical switches.
 14. The network of claim 9, wherein eachrespective edge node is configured to simultaneously transfer respectivedata bursts to plural core switching nodes of the plurality of coreswitching nodes.
 15. The network of claim 14, wherein at least some edgenodes of the plurality of edge nodes are co-located with respective coreswitching nodes of the plurality of core switching nodes, and theplurality of edge nodes and the plurality of core switching nodes areconfigured to equalize propagation delays from respective edge nodes torespective core switching nodes.
 16. The network of claim 14, whereinthe plurality of edge nodes and the plurality of core switching nodesare configured to time lock respective edge nodes to respective coreswitching nodes.
 17. The network of claim 9, wherein the plurality ofedge nodes and the plurality of core switching nodes are configured to:associate respective data bursts with respective burst streams; and sizethe respective data bursts based at least in part on at least oneattribute of the respective burst streams.
 18. The network of claim 17,wherein the at least one attribute of the respective burst streamscomprises a service class.
 19. The network of claim 9, wherein theplurality of edge nodes and the plurality of core switching nodes areconfigured: to associate respective data bursts with respective burststreams; and to switch all respective data bursts of each respectiveburst stream in a respective core switching node.
 20. The network ofclaim 1, wherein the bit rate allocation request specifies a requisitebit rate.
 21. The network of claim 20, wherein the selected coreswitching node is configured to determine an update bit rate to replacethe requisite bit rate.
 22. The network of claim 1, wherein theplurality of core switching nodes is arranged in a composite starconfiguration.